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PREFACE, 


The present courso of Lectures on the Measurement of 
Groups and Series deals with some of the most modern 
methods of statistical research. Interesting as they were to 
those who had the advantage of hearing them delivered, 
they will doubtless, when studied at leisure in printed form, 
prove even moro interesting and useful. 

Those Lectures are the fifth of a Series originated in 
1897, dosigned for the assistance of Actuarial Students in 
connection with matters not included in the official Text 
Books. Three of the Series deal with legal matters, and 
ono with the subject of Stock Exchange Securities, The 
present course carries the range of topics into the field of 
mathematics, and it is hoped that courses of lectures may 
he heroafter provided dealing with other subjects, practical 
and theoretical, relating to those branches of knowledge 
which it is the province of tlio Institute of Actuaries to 
promote and encourage, 


W. H. 


24ith J February, 1903. 
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Note. — In the following; lectures I havo made fruo use oi‘ those 
statistical methods and formula?, which (though in many oases of recent 
origin) may now perhaps he regarded as common property; but 1 hope 1 
that I have not inadvertently quoted without reference* or misquoted, 
investigations or theories, which may bo ro garde cl as personal to any of tho 
small body of statisticians working on the subject treated, Tho lectures 
had to be prepared both for delivery and for tho Press at short notice in tin* 
midst of a busy session. This fact must be my apology for any obsouronoss, 
unnecessary repetition, or clumsiness of arrangement or expression, which 
may be found. 



MEASUREMENT OF GROUPS. 


FIRST LECTURE. 


Gf ENTLEMEN, it was with considerable diffidence that I 
undertook to lecture to members of the Society of Actuaries 
oil a subject with which they may bo presumed to bo so 
familiar. When I was naked if I could undertake these 
loc Lures I had somo difficulty in choosing a suitable subject, 
and thou it occurrod to me that my audience wore probably 
ooncorned with the practical aspects of a question which I 
was chioJly considering* from tlio theoretical point of viow, and 
that it would therefore bo most suitable if I endeavoured to 
lay boforo thorn somo theoretical considerations on subjects 
which did not como in their ordinary course, but which woro 
allied to the subjects which naturally come before thorn, and 
which are allied to those subjects on which I havo spent a 
certain amount of time and attention. 

Groups. 

The first subject which I have selected is tho measurement 
of a group, the characteristics of a group, and its representa- 
tion, By a group I understand a number of persons or things 
ouch of which possesses a measurable characteristic, the 
group being arranged according to the magnitude of the 
characteristic. For example, if X have returns of the wages 
of a largo number of pooplo, and I group them according to 
their wag*es, saying how many are earning 20s. to 25s., and 
so forth, I shall havo such a group ; or if I choose a section of 
tho population and group them according to ages, X should 
have another group of the kind I am thinking of. The 



remarks I shall have to make about groups will, I ii 
fairly general, ami ap)>ly in a very largo range of p 
l)ii L for convenience of illust ration ! shall coniine nr) 
only a small number. 

Tim particular group I utn taking for discussii 
evening is taken from trim current (humus, tlio imuil 
married women in the county of York, on (hmsus dn t ) 
grouped according to their ages. In selecting a gn 
discussion it must bo largo enough and small oiiou 
must bo suflicumtly largo in conceal individual pocul 
or peculiarities of small sections ; ii must bn sullicieutl 
io bo homogeneous. I loth tdic*s<^ limits are relative, v' 
that is largo enough for one purpose is Inn large for i 
purpose; and a group that is homogeneous for one is 
gonoous for another, The death rule nf a. whole count 
bo sufliciont forcertniu comparisons, but for oilier coni} 
you must subdivide according io districts and ago. T 
that has been selected must be kept, in view befo 
arguments are based on the grouping and its meusuret 
There tire two main divisions of groups; those t 
derived from exact observations, and (Imse which ] 
regarded as samples of a larger group the whole of wli 
not boon measured, Kor purposes of reference I am 
them Group a, when l lie observations arc suppose* 
correct, as, for example, Lite number of persons who 
receipt of a certain income; and Group /*i, where Urn n 
arc estimates ; lor example', an estimate of the nun 
persons who may bo expected In bo m receipt of 
incomes ton years lienee, Iroui an investigation of som 
now, or at some previous Lime, As regards Group 
chief work will bo to Koloet sonic method of abbru 
of describing in brief, each group; in the ease of (1 
our work will be chiolly in criticise Urn enrreelimss 
statements, and to find methods which are j 
applicable for its correction if ii is not exact., to men 
precision, and then al Lanyards to select some suitable 
of abbreviating it. 


Thu Guai'iiio Mtmion, 

Jho two chief methods of abbreviating or inves 
tho charac tori sties of a group are the graphic mot! 
the method of averages. The method of avenges 



porhaps ho referred to first ; but, since tlio uso of diagrams in 
explaining tho meaning of averages is very considerable), I 
have thought it- bettor to take the method of diagrams first* 
I have drawn out, in four different ways, tho group already 
named, tho number of married womon in the county of 
York. 


Ages of Wives present with their Husbands in the Registration 
Oounig of York , 1901. 



No. poi 1,000, 


L’er 1,000 

Between — 



Nol more fclimi — 


16 and 16 yenrs 

4)1> 


16 years old 

■01 

16 „ 17 „ 

*03 


17 „ 

•04 

17 „ 18 „ 

•2 

f 5 

18 

*2G 

18 „ ID „ 

1*2 


19 

1-6 

19 „ 20 „ 

3-1 J 


20 

6 

20 „ 21 „ 

8 

-OQ 

21 

13 

21 „ 26 „ 

76 

■ OO 

25 

88 

26 „ 30 „ 

167 


30 

245 

DO „ 65 „ 

1(52 


85 

-io7 

36 n 40 „ 

147 


40 

554 

40 „ 46 „ 

125 


45 

679 

45 „ 60 ,, 

106 


50 

784 

60 „ 66 „ 

80 


66 

864 

66 j, 60 a 

65 


CO 

919 

60 „ 65 „ 

40 


G6 „ 

! 959 

66 „ 70 „ 

22 


70 

981 

70 „ 76 „ 

14 


76 „ 

995 

76 a 80 „ 

4 


80 

999 

Above 80 

1 

I 



i 

1,000 




Total number 

included, 610,505. 



Thoro are shown in Diagrams I to IV, tho numbers of married 
women in that county por thousand between fcheso ages. Tho 
total of wives in tho county of York living with their husbands 
was 610,000 odd. As is usual, tho numbers are divided in years 
between the ages of 15 to 21, and after that in fivo-y early 
groups. Tho first method of representing fig tiros by diagram 
is to place a dot in a given vertical position for each person 
or item in question. This is indicated in Diagram I. Tho 
method is not very important and is perfectly obvious. I 
should only use it as a means of passing to another, if it were 

u 2 



I. — Point Diagbam. 



not fcliat in those classes of measurement where tho quantities are 
separated by a finite interval it is incorrect to uso tho methods 
shown in Diagrams IT, III and IV. If one was entering 
tho number of houses at particular rents in n town, wliore it 
might perhaps be supposed that the rent always jumped by as 
much as £2, one could represent properly tho number of 
houses at each £2 mark, but there would bo no houso at the 
intermediate intervals, and it would be incorrect to proceed 
any further to such a curve as would lead one to suppose Unit 
the quantity dealt with was continuous. To take another 
example, the railway service from one town to another might 
be represented by a series of dots placed vertically over tho 
timo taken by the train, measured horizontally, but not by tho 
following methods. If, however, the quantity is capablo of 
continuous variation, such as age or height, or if by a slight 
extension of the meaning it may be regarded as boing capablo 
of continuous variation, such as income, wo may proceed to 
the method of Diagram II. 

In Diagram II rectangles are drawn whoso heights aro 
the same as the height of corresponding lines of dots in 



II. — Area Diagram. 



Diagram 1, but tho breadth is the unit of abscissa), in this 
caso Hyo years. 'I'ho areas can bo regarded as representing a 
number of persons, Tho area of tho whole space enclosed 
by tho outer lines ot tho roctanglos is on tho scale chosen, 
4 squn.ro inches, which represents tho whole of tho population 
considered; tho broadth of each rectangle* is f inch, and 
1 inch squared represents 1 per coni. 

Boforo wo can go any furthor wo havo to make some 
assumption as to tho distribution of persons within the five- 
yearly intervals selected, Even in my class, (a), whon tlio 
observations are known to bo oorroct, somo assumption must 
bo made as to distribution boforo proceeding further. IE, for 
oxamplo, tho correct sot of measurements of the hoiglits of a 
rogimont was givon, ovory soldier being moasurod correctly 
to tlio nearost iV of an incli, no correction would bo required 
for actual mistakes, but before a continuous curve could bo 
drawn passing’ from oilg T ^- incli to tlio uoxt, somo assumption 
must bo mado as to distribution of heights, c.g., that pro- 
gression was uniform between tho given points. In tlio case 
* Tlio avofiB for jiges below 2& fU'Q shown in move clolful. 


of observations (/i) which uro only sampler, i( is sti 
necessary to make hoiiio nssnmpt Ion a *s lo coiilnniii 
order lo consider wind, assumption in proper, in the 
question, remember Mint the fuels given oxiirMy i 
numbers of persons whose ages lie between cerium 
Unit is, wo arn given fin' area of ( lnt rectangle, or 
figure wltieh noplaces the rcclnugle on each unit < 
Wind, wo have lo suppose in llml ihe ages uro sululivii 
merely into years, but iutn inlinitcMinul units of lini 
wo have to uni ko some assumption for guiding us in 
from ono of the given positions lo (In* ne\t, There lire 
positions which give definitely the number of person* 
20 years, below 2b years, and ko forth. We have (o I 
number of persons below 22, or any cither unsigned ngr 
is it quite familiar idea; but there are oik* or two th 
eonnection with it which it is necessary lo point out. 

Tun lljHTOUUAM AND Till: OoiVlJ. 

If straight lines me drawn from llie middle n 
horizontal line in Diagram II to the middle of the \ 
got tho dolled line in Dingrum III (culled n histi 


1 11.' UlSTOmiAM, 



That is certain to ho incorrect on two grounds. In the first 
place, the area bounded by the linos nearest tho highest point 
is necessarily Loo small, for part of iho area botwoon iho ages 
MO and Mf> is out olT by iho dolioil lino, anti nothing is placed 
instead of ii. Uol'oro pointing out iho othor way in which 
iho histogram is necessarily incorrect, wo will |)ass on to 
Diagram IV, which is thus const, ruotod. At tlio ages shown 
on iho horizontal axis arc drawn rootunglos proportional to iho 
number of persons below that ago ; then wo got a continually 
ascending liguro eallod an ogive, which is given as absolutely 
correct in group a, iho points at iho oornors of iho steps 
obtained in iho liguro being givon by assumption* The 
problem comes to bo to draw some lino or curve from these 
lived points that shall satisfy the conditions which wo must 
assign. Now, it would bo necessarily wrong to join these 
successive points by straight linos. If wo take Ihroo corners, 
A, M, D, not in a straight line, we get a sharp angle at lb 
Introducing sharp angles there necessarily involves an error, 
for they indicate discontinuity at certain arbitrary points, 
which can correspond to no facts in naturo. The angles 
obtained in the histogram are erroneous for similar reasons. 
If, as in group «, wo aro to suppose the observations to bo 
correct, a continuous line must bo drawn through all Iho given 
points which has no sharp angles in it, no sharp change 
of curvature. If, as in group /J, wo are not bound to assume 
that the observations aro correct, the lino maybe drawn not 
passing through the points, but near them. Many groups 
may ho represented with sulliciont aceuracy in rough work by 
drawing a freehand curve passing through the givon points; 
it will ho found that there is very little margin for drawing 
such a curve, if the rule is made that the curvature is never 
to he greater than necessary, that the direction is not changed 
more rapidly than is necessary to pass through the points. 
This condition, stated in mathematical language, supplies the 
main problem of interpolation, 

In group ft wo aro not bound to assume that the curve 
pusses through all the points, and the question which is the 
host curve (drawn freehand or otherwise) vtw the points, 
needs the theory of probability for its discussion. 

I NTMlfPOLATION (JllUVM. 

As regards group a, to which discussion may bo confined 
for the n resent, whore the curve is to pass thvouah all the 




point#; I sug’gOKi Ixho familiar moUiod of mtmpnlalinH by 
bolio formula, Take Uio 0(|miU<m l a v i\ j i v I a 

oonlmuod Lo ilh innnv l.mnim mm MninnnMiiht t“ f 



under discussion it would bo inexpedient to take more than 

4 or 5 terms, because wo arc fitting a definite algebraic curve 
to irregular observations, and the law which underlies the 
observations may very well changes if we take a larger 
poriod than 25 years. I have confined the work for this group 
to 5 terms, continuing that sorics to a> 4 . 

Consider what conditions that curve satisfies. Stopping at 
the second torm we have a straight line, which can only he 
made to pass through two points. So wo have to start afresh 
at the second point, and thereby contradict the first assumption 
which wo make, that the increase is not subject to violent 
changos, introducing an angle at any point. Introducing the 
second term wo have a parabola, which can be made to pass 
through throe points; tho curve lias continuous curvature, the 
third differential, and the third differences obtained from the 
vnhioa of y at throe consecutive equidistant values of x, 
vanish ; that is to say, thoro is no sudden change in curvature. 
Tho first differential measures the inclination of the line ; tho 
second measures the change of inclination, and if that is 
constant tlioro ifc a constant change of inclination, but no 
sudden break. But wo have no reason for assuming a constant 
change of inclination, and the curve which passes through 
throo assigned points will not in general pass through the next 
point. Wo then proceed to include further terms. If we take 
tho oqualion up to a? we can introduce a point of inflection, 
which wo cannot do with the parabola. If we take tho 
equation a stop further wo introduce two points of inflection, 
and it is unnecessary to go as far as the fifth term in a 
diagram like this. If wo take 0 terms, tho Gtli differential and 
tho Gtli difference vanish, tho 5fch difference is constant, and 
thoro is no suddon broak, and so on. 

Now take tho equation as far as the term x*. We have 

5 unknowns, and can determine them by assigning the 
condition that tho curvo shall pass exactly through 5 points 
on tho diagram in question. I havo calculated the equation 
of tlio curve which will pass exactly through the 5 points of 
tho ogive, corresponding to 20, 25, 30, 35 and 40 years. The 
molhod of calculation is a good deal facilitated by the use of 
finito differences. Jlofor as origin for the abscisses to age 20, 
take 5 years as unit, so that x is 1 at ago 25, and write down 
tho equations which naturally arise, taking the numbers from 
tho last column of the table on p. 3. 
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parabolic or smoothed ourvo imw nliliiiuoil from I >itii*n 
(ifui Lo used l,o l‘u rninli values l,o ropluen iho h in( tifvi' 
Diagram III. Tlio sumo unil. fur ,n ( | i v ■ < yours) mi 
courso, 1)0 list'd in lto|,|i otises. Thus, if ,r I g»h i 

r» i iff,. i • m.vii. 

Al> iJO yours, ass l(l(ld); u|, :tfi yours, I oH'tf, 

Tho ourvo old.m’nod in this wuy is shown in I, ho t-tmli 
lino in Dingram III; thin ourvo suliulio'i I ho oimililimi 
llio iivoiis standing on | ho fi-your Imi.os, Iron, :;t> yoi 
45 yours, slioiiltl roprosonf on I ho olmuoii sntlo I ho mini 
porsons given by Mm original Inhlo, mill llml I horn shui 
lit, abrupt changes of ourviituro, 

Hiuoo tho ourvo luis Iss'ii idmson tin us In stilisf 
conditions for only livo ugo iioriotls, il v\dl not noooi 

satisfy any more; but in this ease tho 

a straight Kim, which approximately fidlils I ho conditio 
55 years. If wo nootl greater mrurnoy in lalor yom 
should oalouhito now values lor I, ho «’h ami ohluiu n u 

ourvo, satisfying a now group of nroa litmus. 

noodotl to draw tho whole ourvo aooiirufoly, wo should 
to doviso a method of passing without a lunik of mail 
from ono suoh parubolio ourvo lo flit' next; Imt, as if 
only want means of obtaining specified pninls ,,,, |) MM 
and that can ho dono hy choosing (he spirinl parabolic 
tliat is m tho neighbourhood of tho rcmiirn.l .minis 



A VKKAUHH. 


Tni'i Modm. — At i lio highest point. ol“ Um smooth ourvo in 
Diagram III, Imnoo =D in Uio ogivo fov Um snino 

value of <r. Thus, hh in olhonviso evident, Uio ogive is 


steepest and there in a point of inflexion, at. Muil value of <r 
which gives tlm greatest ordinate in Diagram IJI 



= 0, wo have 0=2a.|*|-tt««a’-h and 


a? = ( — '\th H- \/ll«3 — JM-Uya 1) h- 1 j , 


where i*:i, «i are given in turms of Uio differences above. 
Writing in those values, wo lmvo *r~ 2*021 , which corresponds 
1,0 UDdO you ph, 

If wo had included a fnrUior form, rv-S wo should lmvo 
a cMiliio to hoIvo t.o dolermiim at, If wo lmd only gone as 
far hh wo should have Uio equation -bikini; that 
j M> i«:H ~-A5~:-Ail ; 1ml. this formnla is unsafe, unless Uio 
fnurl.li differences of Uio original figures nro approximately 
Hero. 

'Plm equation taken an far aw Uio in 1 form appears Lo mo 
l,n lio pmelieully Uio IiohI. in Mm example wo nro discussing, 
If wo chouse the onollioiimla lo satisfy llm conditions starling 
from Iho ago !i5, wo obtain years as llm position of 

Dm liigluml point. Tim disoropHiioy bolwoon this ami Urn 
;K)*|() yon i*h found from Urn parabola starling from Uio ago 
o() yolu'H, arises from Urn indoLonninuloimss of Urn original 
figures. It seems bowl lo lake Urn vuluo from Uio 1 minor 
curve, hh Urn point Mmn lion noar Uio middlo ol Uio assigned 
valnoH. Wo adopt Umn Urn ago MHO years as Uio required 
ago, ami timl Unit 100'M at Unit ago. Tim ago so found 
j H \iulIod Um moth' of Urn grmi]i; it is also oallod Urn position 
of tjrmihwl thomhj and of llm mnuumitm ovdimtv. 

4 Tiiw MirniAN. Tim almoissa oT Uio point (M) whore 
trim ogivo is out by tlm hoiT/jontal lino ball way up Uio 
soalo (from 0 to 1,000) is oallod Um median. In Iho 
bislograin, or tlm smoothed ourvo wbioli rophiocH it, Iho 
vortical through Um median dividos Um ourvo into equal 
areas. When Um ogivo is drawn, Um median can at once 
bn found graphically* To limi it algebraically, Lake a 
parabolic equation as before, Hiitisliod by live points lying 
« mu ii I.Uii miwlinu. nhln.iit LllO CUOilloUHllH UK boloVO, ailtl put 


v = nOO« Olio or the roots of (hr «M|iml 

median. Hl-iirtiiiK «-l W l,rlV,, 

2'tfH* UI4| ? .i- I 1 ‘! J< *J M),)f 

ami, solving by Honing mellu.d, l,m 

median ago is MO I n.r-Mtf'll (years). ^ , 

Tho (jUarlttofi are Mm nhsr immio nf (he points (<^i 
wltorn Mio homonlul lines inin-i|iiiiHiT ami ihmwpuirli 
(,|U) sealo (li’inn 0 lo 1,000) nil Iho Tim v. 

through those alim-msin in Dii^nun Ml """Id, Uy 
with Llio mod itiii vorlmnl, divide Mir invii M 1 1 < * Untv 

p lir ( uSt Tlio quai'liles run Ik* found from nt I ho eqm 

already written by putting j/ MnO mill VbO mn-n^w 
and solving I'm* .« i nr they rim In* found m pliion II y* 

A rough mnthnd of fi uc I i up; IIm*ho points, nlfen fnilliri 
Hoourn to, mid Having Hr more laborious solution, in Of in 
ibat Um parts of Min ogivo between I ho roniero \ 
oon lain Mm median urn atmighl lines, Ml ion 1 nro H>7 
thou am id) bolow Mf> years; OM mi b id Mm* do I ip 10 
group, whjoh enuluins I -17, urn !.u bo laheii lo rrnrli 
median, which in on Mm hypothesis id it straight 
(35+ /W nl‘ 5) years, Mini is MS 1 IV year**, u \ahie dilb 
little from that already ohlnined, 'Pin* lower quart i) 
oilhor method i.s M()*H> years, Hu* ii | i|iur IN'H Near-. 

VVo have now llio following ligures : 


Lower tpiartile 


yean 

Mode 

:t(ilo 

n 

Altai ian 

:!.‘M I 

1 1 

Arithniolie average, 

•Mil 1 1 

1 1 

Upper quurtilo 

<IM(i 

1 1 


Tho arithmetic! average in euleululed ilirooMy in 
ordinary way, but in of little iuiportntiru in mirh a g 
as lids. 

If a person is Inkon at random from thin group, 
most probable ago is MOM yours, Mm modo. ll ia in like] 
not Mi at slu> will ho over MHM I yours, Mm median. ll ) 
likely as not that sho will be between HOMO mid IS O yi 
The olmnros aro M to I against Imr being less than M 
yours; M to ,1 against Imr being over 4H‘(i yearn. 

Other points can bo obtiiimal by dividing the group 
ton equal parts, or one hundred equal purls. 'Phene are eu 
llio deciles anil permit ilvn respectively. 



The mode in such groups as this seems to be of 
spocial importance, as being the most probable value. It 
is entirely unaffected by the extremes, If the Census 
authorities had omitted all married women over 50 or all 
under 20 years in their enumeration the mode would be still 
in the aamo place. That is very important when we arc 
dealing’ with inaccurate figures. In those curves which have 
a distinct mode, where the curvo first tends upwards, reaches 
a height, and then comos down again without ever pausing or 
ro turning to a second height, and where there is a certain 
symmetry or similarity of distribution on either side of it, in 
such curvos the mode is of special importance. If, on the 
other hand, you have a regular mountain range represented 
by your curve, the mode is probably of much less importance. 
If you have a single peak it is probably of importance. But 
though it is important in itself it is quite insufficient to 
doscribo the curvo ; it only tells you the position of one point ; 
it docs not tell you the steepness on either side, or the distance 
from thoro to any assigned point. 

The median is affected by extremes to somo extont. If 
the authorities had omitted all the married women over 
50 the median would of course havo been shifted, but not 
vory much, for the area, which would have been left out 
at the extreme right, when halvod and distributed in the 
neighbourhood of the median would ho found to have 
caused only a very slight displacement of it. That can 
bo verified from Diagram IV. To take an example which 
can be suppliocl by tlio diagram, suppose you omit all 
those beyond the 800 per-cent, which gives those above 55, 
thou the line through the 400 would give the median, which a 
very rough measurement gives as 33 yours. That is to say, 
the median has only boon shifted five years by leaving out 
that immense number. If, instead of omitting theso people 
over 55, the Census authorities had simply said, “ Hero is a 
married woman, obviously old, wo do not know her ago,” and 
had entered lior in that catogory, it would not have affected 
the median in the vory least. The position of tlio extremes 
doos not affect the median, only tlio number of instances, In 
the statistics with which I personally havo to deal, often all 
that is known is this number. In this rospoct tlio median 
is very superior to the arithmetical avorago. Tlio sumo applies 
to quar tiles. If wo do not know the exact positions of the 



iimlanrcs to 1 1u> right nr it* the left *d l hr quarhles il. (jo 
matter, pi*ov itlo< ! we know the uuniht'r»> Il \<m deeuj 
quartiles and Min median, you have throe (minis on Mm 
our vo, l.lmso positions in Mm histogram, Innu which the 
(Min o flcm bo constructed with Inn* accuracy, The aril h 
average, or simply u the average,” gi\ o'* I In' i 

centre of gravity of Mm group when plotted out 
Diagram III. Tim anMiluoiic average hit'll H n I o:« n 
computations, but, in my experience, it is Ihr leusl vnl 
of Mio moans or averages which ran In' calculated ; 
pooping oxporionoo may be different. 1 1 is vnr> lial 
error, If a part of Mu' ^rmi|i is accidentally ninitlc 
average is at on on ufTooled. 1 1 the numbers an' imuti'ii 


llio positions not, very far you would liiul by ex per 
Mini llio arithmetic average has not moved much ; lait di 
any numbers are lolt out, Mm arithmetic average is dish 
Bui Mm reason I distrust Mie arhhmelie average and d 
advocate its use is, chiolly because il rendon such fulh 
arguments possible. If you are comparing mu' group 
another, alter a little interval I lie arithmetic u\crug*o 
have remained quite steady when Mie gnmp Ims chi 
considerably, both the extremes having 1 come in toward 
mean; or it may shift when the group Ims not really chi 
its diameter, but only shifted its pnsiliou a liltie, 
particular change el the urilhmolie average may c*orret 
to an infiniLo number of dilfnnmt. kinds o( change n 
group j and it is very ol Loti puinled out Mud a certain ) 
1ms eh allied, that something has improved because 
arithmetic average 1ms changed ; whereas j| is only all 
tho relative positions of two groups which are not conn 
in reality. x |j we have a perfccMy homogeneous gnnr 
instance, il with wage statistics, wo deal will) a net oj 
(loing similar work nml earning similar wages, a change i 
arithmetic average is signilieant ; but if wo urn dealing* w 
composite group emu posed of skilled mid umikdlod work 
two ,l ^ lm >gonenus groups merged into one, the m it In 

avemge might increase either hy the higher group 

a hill o while the lower group went down iica.iv as far, n 
otlier way about; or by a combination of thorn, two ll, 


, ; I'Min, innii 

of tho Urn will hiikkksI rimliim in miiiin it; 
ttMlimolic avcmRo 1 h until very i<iin>li<» N |y. 


* M* T III Mill II 

"bile, im u miitti'r of 



80 the arithmetic avorage can never give dofinite information, 
and very often gives fallacious information. I have not time, 
and perhaps it is not necessary, to dwell upon this point, and 
refer to tlic correction factors for Urban death rates. 'Hie 
necessity of that method illustrates my meaning in saying that 
b of ore an arithmetic avorage is used, it is necessary to make 
sure that tlio group is homogeneous. 

The cjuar tiles and the median not only give the definite 
position of tlio median, but also a measurement, which serves 
to show liow the curve is dispersed from its central position. 
Tho distanco between the two quartiles, 1 8’4 years in this case, 
shows to some extent how tho curve is dispersed from its 
central point. That I shall return to in giving' other 
measurements of this dispersion. 

If wo were dealing with a group that did not give any 
such rogular figure as this, a group to which tho modo was 
certainly quito applicable, it would probably then bo best not 
to attempt to draw any continuous curve at all, but to keep to 
such a diagram as that on page 5, and to calculate the deciles 
as accurately as possiblo. By making some simple assumptions 
as to continuity, it would bo possible to calculate roughly the 
nino docilos, dividing the area into 10 etpial parts, and enter 
thorn as a description of tlio group. I think that is tlio only 
method of satisfactorily representing an irregular group which 
cannot bo divided into distinct homogeneous groups. 

Coiipaiuson or Groups. 

Tho ogive diagram lends ilsolf more readily than any other 
to tho comparison of the two groups. I have selected two 
groups, which one might wish to compare, from the same 
Census table, tlio husbands whose wives wore between 
45 and 50 years of ago, and tlio wives whose husbands were 
between 40 and 45, which are represented by the lines 
LL and KK rospoefcively ; and I have calculated, by one 
motliod or tlio other, the mode, tho median and the qnartile of 
thoso groups. Thus, for instance, from the curve IC, of all the 
wives whoso husbands woro between 45 and 50 years of age, 
as many woro less than 45*5 years as were moro than that; and 
similarly for the quartiles. Tho curves are very similar, the 
husband curvo being four years to the right of the otlior, 
Tlio motliod needs no further comment. 






Illustration or usu op thh Median 

T may take one example to illustrate the use of the 
median. The diagram on p. 18 represents tlie weekly wages, 
valuing everything that is paid m goods and not in money 
at an appropriate rate, of fclireo classos of labourers in 
i hi gland, namely. Artisans in Provincial Towns, sucli as 
Birmingham, Agricultural Labourers — the average for the 
whole of England — and Labourers iu tlio same towns from 
which the Artisans woro selected. The figures are rather rough, 
and there is no material for making them exiict ; but I think 
the lines drawn roprosent with fair accuracy the course of 
wages • for if wo once established tlio fact that all agricultural 
labourers aro bolow the median, wc have simply to count them 
and not enquire about their wages. And so if wo establish 
the fact that any body of men is well above or well below tlie 
median, wo have not to enquire into their wages, but simply 
to count thorn ; and to find the median we have only to 
investigate more carefully the body of nicu whose wages are 
near the median ; that is a comparatively easy task, because 
tlio body of men who are near to it arc those whom we see 
any day in any ordinary industrial undertaking. The 
Census figures are bad for this purpose in 1902, and 
thoy woro much worse in 1801 ; and there is a great 
deal of computation and guess-work in determining the 
position of tlie median at any time through tlie century. 
But it can be clone within certain limits of accuracy where 
the task of determining the arithmetic average would 
bo hopeless. When wc have determined tlio median and trace 
out tlio positions for 110 years wo have a much more 
interesting and exact piece of information than if we had 
made use of the arithmetic average. We have tlio wage of 
that man who is half way up the skilled wage earners ; but if 
wo g’ivo the arithmetic avorage it will carry us no. further ; 
it is simply a numerical quotient. The lino in the diagram is 
drawn through tho estimated positions of the median for all 
malo adult wage earners in tho United Kingdom, at selected 
dittos. These figures aro rough, and should not bo quoted 
without verification, The only ones calculated are those with 
a dot or cross in the figure ; intermediate lines are interpolated. 

o 
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MKAKURUMHNT OR (J ROUPS. 


sHcoNi) u«mmn. 


M'llK NtANDAUD DEVIATION AND Tit I 1 ) MODULUS. 

llIW methods I have employed mo fur lor determining the 
median and l<lm mode, together with tlm ordinnry method of 
determining I.I10 arithmetic average, together also with tho 
quurtiles and deeiles, give a series of definite quantities 
connected wildi the ourvo. Nadi of Mi oho quantities — M10 
mode and tho median -“performs the function of tin average; 
Mott is to Hay, that number by itself gives briolly one of Mm 
most important positions, one of Uio most important 
characteristics of M10 whole ourvo. Nut no one of these 
(jimntitios gives suHioient information to unable us to 
reconstruct the ourvo or to describe it completely. It is 
true that if wo have given the nine deciles, including Mm 
median, wo have nine points on a continuous curve, and in 
general it is possible to construct it with reasonable accuracy. 
Hut if we only have tlm mode, or only Mm median, we 
have not enough to construct the curve. My object, then, 
is to develop one or more methods of calculating other 
quantities related to the group, which will enable us to 
complete or mmmd the description of the group, as given 
simply by one of tlm averages. 

We will always suppose that the group is described in 
relation to a horizontal axis OX, and may be of any nature 
about Mm axis. What we have found so far in the median or tlm 
mode is one point on that group, one position on that axis — in 
the case of Mm mode the position under the highest point- 
in tlm case of Mm median Mm position, the line through which 
divides the curve into two equal areas — in the case of the 
average it is the abscissa of Mm centre of gravity. I have 
now to (bid a second quantity which will enable us to describe 
or determine the shape of the curve when you are given this 
one position on it. The method I am going to take is 
indnnnnd<mf. of miv assumed sliaue of the curve, and it is 



applicable to both tho groups In which 1 relerreil nil page 
tho group which is supposed In he im neeiinile represnnlali 
of tho facts, ami Mini- which represents only samples of 
larger group whoso observation is m>t completely made, 
have first to describe the well-known method nl caleiihilh 
tho deviations from Iho average, ami limn In pmia mi to lii 
tho average deviation, tlio average Mpmre ol I Im* (h«viu( iu 
and tlio average cube of deviation. hel (horn Im n observuliu 
represented by *<;, , ,r a . . . ir„ ; lot .r ho the nlmcis’.n nf |,| 
oontro of gravity ; then m is the average of (Im group, the sn 
of tlio si's divided hy u, their number. From eneh of Mu* Jt 
subtract tho abscissa of the eeutre nf gra\il t y; (him { r, 

<1*2 I'Iioho n ro I Im devial-ioim ul Hie obaorvi 

lions from their average. In some conned ions limy are ntilh 
the errors from the average, bid I shall ndnpl the win 
“deviation ,J in every nun, In (lie lir.d (dime it is In I 
noticed that tlm sum id the deviations ih uecesMirily /.ero ; \\ 
^J5 , (i*' — u>) s 2Jpr-M.li AK 

»F hJio squares of tin* deviations is . (0 )« | ]m 

-Si (m-- a *} 2 is tlio mean si|uare of tlm deviations, which 

othonviso called the seen ml moment of Iho devial inns, abm, 
tho origin in this ease. Tlm word moment is from 
dynamical analogy; it is used in this enntmolioM by IYoI'cshu 
K arl Pearson, 

Tho following notation is adopted. Thu mmimid 
measured about tho origin are written /i,', // / , ( about 111 
eontro of gravity /q, ih . , tJ so that 




l*i=* ^ 


IH= 1 2(,»>-fe)' J = 1 : 
« ' n 


I \r 


| ^v, Ih '.,. lh . fl ' , o„, ( 

Tho qunniilioH wo nooct arc nob I, ho ioi.Ih „b, m |, u: 
arbitrary origin, hub bho iitomonbH nlx.ui, Iho mibri' nl' i-rnvibv 
i 3 nb ib is far easier bn calculate bho inniiuinbM ulinub ai 





n, will now bo convenient l.o follow Ibo figures on Ibo 

liiblo mid diiig'vmn adjoining. Tlio liomroK nro Ini 

V( , 1M)l -L of 1,1m Atilbropomolrieal <k nil I i«n of I ho llritish 

Association, in 1881. They nro soleclod merely 101 being n 

convenient group by which t.n o.\]iluin 1 1 uloiilnlbui of these 

moments. Tho heights of I ,985 persons wore given us between 
certain inches, between . r >7 nod 58 inches, mi under I ho cohimn 
liondod «o -I- f»7 i . I f'litvo I, ho origin it I 57 J inclion, and I, ho 
abscissa) for I, ho successive group* nro I, Li, :i . . . , 20. Tho 
nuinbor of instances in those various group* nro those given 
in I, ho Kocoud column, under I, ho loll or//; one person ondor 
58 inches, olio between 58 mid 51) inches, and mo on. Tho 
instances ill this onmi occur in groups, and we nro no|, iildo l.o 
separate them by hiouiih of Ibo did.ii, lionoo ouch deviation will 
occur in most oases more than ouco. Tlum, ii. dovinlion hIiowii 
boliwocn (i'li and (So inches ocoum 158 times, Insloiul of adding 
flic iv’a hi in ply In obtain Ibo deviation wo iniilliply ouch 
deviation by the nutiibor //, tho ninubor of limoM it occur*, and 
so obtain the third col unin mi/, whoMo mini in lb, 181 , which is 
the lirst lneinent about tho origin. 'I'lio mini of I lie deviations 
is to bo divided by 1,5)85, the tolal nuinbor of deviations, to 
givo the first nioineut, namely, I()*0t2, and this gives the 
position of the centre of gravity measured fmin Ibo origin, 
57 J inches. 'I'lio columns under ,r//‘ J and in/' 1 rmpiiro no 
explanation. The totals 207,987 and 2,8 1.8,1.27 are divided 
by v, giving 107*5 and 1,218, g'., mid /</., in the nolntion adopted. 
Tt now remains to reduco Ibeso moments about Ibo origin to 
tho moments about the centre id’ gravity, by ineiimi of I, ho 
formula) given above. 

The practical simplicity of evaluating (bo mnmcnln by this 
method arises from the fact tluit we are dealing in the ,v's 
with a sorios of numbers ascending in uniform order, I to 20, 
and that the whole arithmetic computation is very simple and 
vory easily chocked, whereas if we proceed on tho direct 
method of writing down Ibe position of the centre of gravity, 
which will naturally not lie an exact number, curb of tlm 
deviations will introduce as many decimal pine os as are kept 
in our calculation ; and Ibo Hijuaring and milling will be 
vory arduous, and wo have no ready numnu of checking our 
rosults. Lt is therefore worth whilo to tako tho formula, mid 
chooso our origin so as to givo tho least arithmetic work and 
obtain tho second and third moments indirectly. There is a 



sum 1 1 correction to bo made for Mio moments so calculated for 
jilio second, fourth, and other oven moments, I will deal only 
with the second. If is U> l)o observed filial in ilio wholo 
calculation il is assumed llial all lho persons in a particular 
fir roup are exactly at the middle of llial group, v, ,r/., that 
153 persons in the 04 to 05 inches have the height oxnclly 
(J4J inches. If is obvious that Unit will not ho tho caso, and 
it is oasily soon Unit that will inlroduoo a dohnito error in iho 
caloulation of tho second moment. Por if wo talco ono of 
those groups in particular, and make tho assumption that tho 
whole number lies at its middle point, wo are representing it 
by a rectangle instead of by a trapezium with tho siclo nearer 
the centre of the group longin’ than the other; a littlo 
consideration will show that that makes tho socond moment 
too groat. Mr. Sheppard has shown that under certain 
eiroinuHtances if will bo sullicionl correction to subtract tho 
fraction , l a from the second moment calculated on the 
assumption of uniform distribution at the middlo points of 
the groups to obtain a moment in a true approximation, 
On page 21 the corrected moment, Wu, is 0*504, whilo llio 
uncorrcclod moment, /z y , is 0*047, Tho correction will be 
i tl only, if tho dilToronco between successive groups is ono 
unit of abscissa; if tho difference was h 9 wo should have 
to multiply by h ? ; but for practical work it is best to take 
tho unit as the distance between groups which wo are dealing 
with, and hence the correction is in tho form which is 
of practical use. 

Tim “ Htandard deviation ” is defined as tho square root of 
the second moment about the centre of gravity, Professor 
Karl Pearson used a to denote it, and a is 2*502 inches in this 
on ho. It is sometimes move convenient to deal with tho square 
mot of twice tho moment, which is called the modulus, and 
denoted by the loiter r. Professor JOdgoworlli uses flic modulus, 
whereas Professor Karl Pearson uses tho deviation. We shall 
sen the appropriateness of the modulus when wo deal with the 
curve of error. Tho modulus for this group is 3*023 inches, 
If is a very remark able fact tluit the modulus for the height 
of groups of men is almost universally very nearly fW) inches. 
Professor Kdgworth gives ,n list of 10 such groups in the 
Jubilee volume of tho Jouvnul oj //m lioyul HUUmtwcd hocioXyi 
the moduli are !W! (United Kingdom), 5*0 (ICnghuul), 3*4 
(Hcoflaml), 3*0, 3*7, 3*8 (United States), 3*7 (Belgium), 



37 (Italy). I merely call attention to that in parsing, to 
give an idea tlmt the modulus is of real significance ami 
not n mere arithmetical calculation. 

Average Deviation. 

For the next few minutes I proposo to nssmno that tlio 
curve I am dealing with is symmetrical about its centre of 
gravity. The curve of heights which is shot died on pago 2 1 
is in fact very nearly symmetrical. If the curvo is actually 
symmetrical all the odd moments are easily scon to bo ho ro, 
while the even moments aro not, Thon this quantity <r or o, 
whichever we adopt, servos to measure the distance of the 
curve from its average, to use a clumsy pliraso, or tlio 
dispersion about the average. Deforo discussing tlio 
appropriateness of this measurement I havo to oxplain two 
simpler methods of measuring the same thing. One based on 
the first power of the deviations, and the other based on tlio 
distance between the quartiles. First for tlio avovago or menu 
deviation which, in the notation I am using, is called o;. 
If we write down the deviations in tlio method just dofhiod 
and add them up, we obtain aero; but if wo treat all tlio 
deviations as positive and add up their absoluto values wo 
do not obtain zero. The calculation is as follows : — -Treat the 
negative deviations and the positive deviations separately. 
The sum of the negative deviations is 

Sy (10-041 9 — £«) = 10-0419 x 824 -0331, 
from tlio figures to the left in the tablo on page 21. 'Plio 
sum of the positive deviations is 

10’0419)== 13100— 10*0419 x 111 1, 

from the numbors in the right compartment. Tlio average 
deviation is, therefore, 

i 1 :J 1 00 10*041 9(1111 —824*) — 0331 } — h* 1935= 2*04 (indies). 

Probable Error. 

The ofchor simple method is based on the quartiles. 
Calculate) the quartiles of this group hy any of tho methods 
already given, and you will find them to bo approximately 
65 8 inches and G9 - 3 inches. Since the median as givon on 
page 21 is 67 566, one qnartile is 1*78 inches bolow tlio 
median, and the other 1'72 inches above it. Half the distance 
between the quartiles is called the probable orror. It is a 
term which is so firmly m use that there is no hope' oP 



improving it, l>u t. i( i s one of Mu' most erroneous tortus in use 
in mathematics. Half fho distance is 1*75 inches. If wo take 
n. person at random from Lli is group and moa.suro his height, 
if is ns likely ns nof fho height will bo found fo be between 
tho qiiurlilos, for 1 fho spurn nniifuimid between the ordinates 
at fhn qunrfilos is exactly hull* fho wholn curve, hence fho 
phrnsn " probable error 

If wo w pro dealing with fhn special disfnbiifioii determined 
by fhn equation lo fhn nurvn of urror (see p. d4), we should 
hnvn fho following relations : average deviation ~ modulus 
-r* s / 7 r, nnd probable error — modulus x 'd-700. Those rnlnlions 
urn approximately true for fliis disfribnfion of heights, for fho 
values of fho innun deviation found from llinsn equations when 
fhn modulus is iNiM inohos urn 2‘0d and l'7?i inches 
respectively, wbilo Iho numbers found above am 2'0l and 
1 * 75 , 

Those mnfhods of describing groups am, however, 
applicable fo groups whinhdouof non form, ovou approximately, 
fo iho law of terror. I shall now froaf fhnm wifhouf Lite 
assuniplion flmf fhny do conform, Tim probable error is iho 
nmnsuro of dispersion, whinli is mosf (piinkly calculated. Wo 
{‘iiu svrilo down iho qiiarfiles vory rapidly, and inko half fhoir 
dllTorencn al. once. Ibif Uiai only fakes into account fho 
positions of fho fwo quarfilos, and dons nof fako info account 
Uio positions of fhn extremes, Imf only fhoir si an, and, depend- 
ing as if dons only on fwo quantifies, is liable fo a largo 
amount of accidental error. Tim moan deviation mi fho other 
hitnd fakirs info account fhn position ns well ns fho number of 
all fhn quantifies, nnd is therefore loss liable to accidental 
error, and also if dens not fako at all long fo nalenlafo with 
simple numbers. Tho modulus and the standard deviation, 
again, fako info anennnf every observation, lmf they give extra 
weight fo those which urn a grout distance from the average. 
In some cases that is right,* in others if is nof. Jl wo are 
basing arguments ns fo fho group and file shape of the group 
mi probability, flion very likely if will bo correct to give this 
extra, weight fo an object which is far from tho average, lor 
fho farther from fho average the less fho probability, and in 
some cases the probability diminishes vory rapidly as wu move 
from fho average. If wo am not going to make assumptions 
about the shape of fho curve, nor apply tho principles of 
nvolmbilitv. I do not know that wo shall find any justification 


for taking 1 the square, rather than the moan, clovintion. Ah a 
rough rule we may say that wo pass appropriately from the 
probable error to the mean error, nnd from tho moan error 
to the standard error as the curves with which wo arc dealing 
become more definite and perfectly continuous, and 
approximate more and more nearly to a curve with a definite 
algebraic equation. For very rough measurements which are 
not continuous and which are not to bo corrected, tho probable 
error, measured as half the distance between the quartilos, 
will very likely be the best measurement. As tho curve 
attains a definite shape, and as wo are able to treat tho 
observations as more and more continuous, it will bo well to 
take the mean error, and finally, if wo have a perfect 
algebraic curve, then very likely it will be most correct to 
take the “ standard deviation.” 


Measurement of Skewness. 

Now to pass on to nnsymmotrical curves. Wo have 
obtained by one of tho averages tho position of tho curve and 
by one of these measures of dispersion ono measure of its 
shape. We shall now obtain the measure of its want of 
symmetry, or briefly, of its skowness. Most curvos havo some 
degree of skewness; but in some cases it is nogligible. 

As an example of a curve with considerable skownoss, wo 
may take Diagram III, on p. 6. The curve is olongated to 
the right; the mode is to the loft, the centre of gravity to tho 
light of the median. This is the general order of thoso throe 
averages. If a skew curvo is formed by stretching a 
symmetrical curve to the right, tho stretching shifts tho contra 
of gravity, relatively to tho median; or, from another point of 
view, if a curve is heaped up to the left and strotchod to the 
right, experiment will show that the line through tho median 
is to the right of the highest point. 

Thero are very many possiblo ways of measuring this 
skewness. One obvious measurement is simply tho distance 
of the centre of gravity from the median. Another is to uao 
the quar tiles. Call the positions of tho quartiles, Q l} Q a , tho 
position of the median, 0, of the mode, M, and of the contra of 
gravity, G. In a symmetrical curvo tlie distance Q 2 0 is equal 
to the distance 0Q ( , whereas in a skew curvo it will not bo 
In ft skew curve stretching to tho right, the upper quartilo 
to the right is further from the median than the lower 



quart! I e, and |.1in difference botwocm those two measures will 
form another moans of estimating its skewness- Tho third 
method is to Lake ( I in first power of the deviations, mid compare 
the oxcasH <>u nun side the eontro of gravity with the defect 
on l.lio other, Tlio fourth method is Lo take tho third power 
of tho deviations und consider its absolute magnitude, All 
those methods have their uses. 1 propose to dual wi Mi throo 
of them. 1 will first take that which is nritlnnolicully (ho 
simple-st. Tho simplest measurement, that which you turn 
calculate almost iuntmiLly^ is I ho difference between tho 
distances from tho qmirlilos to tlio median, Hut that gives a 
concrete quantity; in tho ease bolero us so many inches; 
whereas it is convenient to measure tho skewness as an 
almoin to quantity, cm a scale from +1 Lo ~i; and wo must 
therefore reduce this concrete quantity to an absolute 
quantity. Tlio proper method of doing that is to divider it |>y 
tlio modulus, which is a concrete quantity, in this case so 
many inches- The one divided by tho other gives an absolute 
measurement, which would servo to measure tho skewness. 
Hut it is hotter to multiply that nioasuromont by tho constant 
3*20 (see p. 30, below) before using it, to bring it into 
conformity with the theory of probability ; in tho same sort of 
way as tlio multiplication of the second moment by 2 to get 
the modulus brings the standard deviation inlo conformity 
with tho methods of probability. 

Another and ynt simpler method of measuring almost 
exactly tho same quantity, is to divide the difference between 
those two quantities by tlioir sum, that is to say by twice the 
probable error ; then if wo multiply that by lb 14 (son p. 30, 
below), we shall obtain tho same measurement very nearly 
us before, This method supplies a good rough measurement 
which is very rapidly calculated ; we write down tlio median 
and the two quartilos, calculating them roughly or by nno of 
the more complete methods given above, ami at once write 
down the probable error; by this moans tho skewness of the 
group cun bo calculated in five minutes, But tluHnionsimmumt 
depends on the positions of three points only, which are 
subject to accidental errors, and the parts outside tlio qtmrtilcs 
have not much inlliionco on the result. 

A measure, which is influenced by all tho items, is obtained 
by talcing tho third moment about tho centre of gravity* this 
in itself is a measure of the skewness, but it is not of tho 



right dimensions, for it is a concrete quantity of tho order of 
n cube, as the deviations have been cubed ; to reduce it to nu 
absolute quantity it must bo divided by c 3 , Galling tliirt 
measure j, wo have which in the group given on 

p. 21 is equal to +*01 6. 

In a curve which is nearly symmetrical and approximate* 
to the curve of error, the distance between tlio aritliinoLic* 
average ami the median will be £ jc, and the distance botwoou 
the arithmetic average and the mode will be j c, and those 
relations supply a third method of estimating the skownottH. 


Diaukam VIII. 

Daily Wages of Belgian Goal-miners . 




First estimate l.lio modulus, and then calculate the position of 
either hkmIo or median and the tfcril-Iimoi.its average, divide the 
(lisln.net) by v or l>y J r, nutJ wo ohlniii j. [tul. that is not an 
nccumto method, if wo usti Um inode, which cannot be 
precisely determined ; wliilo if wo use tho median, we tiro 
depending nptiti n, single position. Tlio formula) to ho 


pro ferret I 


are 


(Mil I <>Q y 


x.TM-, 


and j = ^- L -r l , l.lio former 


perhaps when l.lio eurvo is not approximately l.lio oil rvo of 


error. 

The adjoining Diagrams ill ustrato tho practical use of the 
technical quantities which I have now dismissed. In I8!)(> 
Mm Uolgiini (to\ ernmeid undertook an Industrial (kmsus, ami, 
amongst ether things, they collected figures of the wages of 
most, of 1.1m workpeople of Molgimti. We have here in 
graphic form I lie daily wages of (.lie Ihdginn eoal minors in 
I81ML A supplementary enquiry was conducted in 11)00 over 
nearly the same area, and the result is given just below. The 
methods we have developed give us a rapid moans of 
comparing 1 the results of those (.wo enquiries, Jt is Urn 
rectangular figures only with which wo have to deal at 
present, The average increased from 8*08 fumes to 
5’iiO frillies between the dates,- the modulus from 1*20 to 
2‘0d7 francs, the skewness changed from a negative skewness 
of —MO to a positive one of *22. Those three statements 
rightly understood und interpreted give in a brief form the 
result of the (Jnnsus. The average has increased, more money 
went in wages, and the modulus and standard deviation 1ms 
increased very much. There whs a development, therefore, 
of wages away from the average, either by highly skilled 
workers increasing their wages greatly, or by ti body of 
unskilled workers coming into existence. If yen look at the 
curve you will nee Uio dispersion is chiefly increased to tho 
right, and that inorousod standard deviation is dun cither to 
the inclusion of a higher grade of workmen Mum bad been 
included before, or to Mm fact that the higher grades of work 
had obtained a groat increase of wages, I am inclined to 
think it possible that the increase of dispersion is partly due 
le the erroneous inclusion of people in tho second enquiry 
which were not included in the first, but I have no moans of 
going behind the figures. The change of j comes from tho 
same wort of reason that a betlv of skilled workmen worn 



obtaining liighor wages, or that tho number of skilled 
workmen had increased. Either of those moans would 
increase j in a positive direction. This use of the letters 
may he left fur consideration. 


Returning for a moment to the uso of deviations in 
connection with the median and arithmetic avorago, L have 
to point out the curious relation between tho two. The 
arithmetic averago is that quantity from which the sum of 
tho deviations is nothing, and tho sum of the squares of the 
deviations the least possible. The second result is obtained 
instantly from the formula already given, /** =/*./— 3 s . Tho 
sum of tho squares of tho deviations from the arithmetic 
average is the sum of the squares from some other origin 
is ///; and from that formula is always less than fi 2 \ Tho 
median on tho other hand makes tlio sum of the first- powors 
of the deviations a minimum, and the sum of the zero powers 
zero. If we take the zero powor of tho deviations, each 
deviation is replaced simply by 1, and thon from tho definition 
of the median we find the sum of the zero powors measured 
from the median is zero. That the sum of tho first powers is 
a minimum can bo readily demonstrated, most easily by an 
analogy, Suppose that it is required to run from a telephone 
exchange separate wires to everyone of n places in a straight 
line, where should the exchange be placed, so as to uso flit? 
least total amount of wire ? At the median position. Ifor if 
you move from tho median position to the right or to tlio loft 
you will find immediately that you are adding more wire than 
you are subtracting. Supposing there are 20 stations, and 
you have a position between the 10th and 1 1th ; if you move 
to a position between the lltii and 12th, you have to incroaso 
your distance from 10 stations and diminish it from 0, in every 
case by the same length of the wire. Tho wires correspond 
to the deviations ; and the sum of lengths of the wires is tlio 
sum of the lengths of the deviations. Consideration of this 
illustration will show that the sum of the deviations is a 
minimum when they are measured from the median, but that 
the median is not quite determinate, for if thero are an ovon 
number of stations the sums of the deviations measured from 
all points between the two central stations are the same. 
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1 II h) Hii bjoet ( I i H(* UMHtM l in this auction is hill of loclmioal 
diflieultios, ami it will bo impossible to cover the subject 
adequately in the short space allotted to it, It must then bo 
regarded as containing rather a summary of those important 
points connected with the theory of error, winch 1 shall have 
to use subsequently. Whiles making it as complete as possible 
in itself, in several cases 1 shall have to ask acceptance 
without proof of results which I shall find it necessary to use 
at a future date, 

Among the various shapes assumed by groups of observations 
of any kind which are (as in the groups already taken) 
grouped in a more or loss regular way about the central lino, 
there is one distribution of the various deviations about 
their centre which is regarded as normal, and the curve 
representing it is nulled the curve of error. And it is Urn 
deduction of the equation of that distribution which I Iiavo 
first to deal with* After wo have the equation we will discuss 
to what extent the normal curve is actually found in the kind 
of statistics with which wo dual, The normal curve can bo 
obtained from the statistics found in games of chance, or 
from the statistics which may be obtained by counting the 
occurrence of specified digits in mathematical tables, or from 
anthropometric measurements, or again from some groups of 
social statistics and from Homo groups of vital statistics, The 
deduction of the equation I. inn going to take is the only one 
which I think lends itself to purely algebraic treatment. 
Other deductions depend upon the use of differential calculus 
or even of the theory of functions. 

hot us consider some occurrence for which the chance 
injh the oliaueo against q } so tlmt.pT </=!• Jj°t llH suppose 



that the event which may or may not givo tho occmroneo takes 
place n times again and again, and that in each it times wo 
count how often success is obtained. For instill ice, suppose 
we pitch a coin n times and count bow many heads are found 
and then repeat the ??-fold experiment again and again and 
register m each case tho number of heads, that would give a 
series ot the kind [ have in mind. For a small i nun bur of 
experiments, if each sot of experiments contained I 0 tries or 
any small finite number, it is easy to set down tho probabilities 
ot‘ the various numbers of successes. -And it is also clear as 
soon as the algebra of the method is tackled, that Uioro is a 
limit towards which these chances tend as tho number of 
experiments in each group is indefinitely increased, What wo 
have to do first is to find tho limit towards which such a series 
of experiments tends when tho n is increased indefinitely. 



Hie diagram annexed represents tlio various chances of 
the numbers of heads in the experiments of pitching a coin 
2 times. J he most probable number of heads is of course six,, 
the least probable none, or 12, and the probability of 0, 1, 2, 
up to six, is continually increasing. If wo orcot 13 ordinates 
representing tho probability of no heads, one head, and so on 
up to XL heads, we get the diagram marked (i + J) w If wo 
take another kind of experiment where tho chances for success 
and failure are not equal, e .y. } where tho chance of success 
!b 6 , and perform the experiment 10 times, wo got tho 
probabiht'es 0 f °n e two, unci so on up to 10 successes 
lepiehentod by the following diagram : 


(•8 + - 7 ) i '» 



The first curve is of course symmetrical, the second curve 
mi symmetrical. What wo have to do is to deduce tlu) 
s impo of tlio curve when the index is infinite, whether the 
chance in favour is one-half, or whether the clmnces for and 
against uro ime<(ual, 

If p is the probability of ail event, and p + z/ssl, then the 

hi 

probability of m successes in a trials is / , and 

1 1 w' 

successive values of m give the terms of tho binomial 
expansion (;H-f/) w . 

Assume that np is integral. Let np-=n\ 7u/ = .v, r + «=«-. 
Denote suecessivo tones by n 0) it t , , . u n > 

\ U 

Then it#, which is the greatest torm, = . 
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The assumption that np is integral made above does not 
affect the limiting form of the equation. 

It is fit this point necessary to consider which tonus nvo to 
he rejected, when n is made infinite. If & is finite, if wo movo 
through only a finite number of tonus from tho greatest 
ordinate, the ordinate u s + %v equals tho ordinaio u s . This pari 
of the curve approximates to a horizontal straigdit lino. io 
take a numerical instance, the clianco of obtaining* heads 
in 1,000 tosses is practically equal to that of obtaining ^00 
heads. On the other hand if x is infinite, it appears that 
u s+x is zero. If the figure is drawn so as to show finite values 
of tC we obtain a horizontal straight line ; but if ail attempt is 
made to include infinite values of x } the curve bocomoa tho axis 
of # and a finite vertical line through tho origin. 

But it becomes clear, if we examine tho shape fo3* different 
finite values of n } that the curve has a definito shape and finite 
curvature near the centre. Before wo go further lot us lake 
an analogy. If we take an hyperbola and try to include tho 
whole curve in our figure the curvo will coincide will) its 
asymptotes. In order to draw the curvo so that tho part 
between the asymptotes and tho vertex can bo soon, wo must 
adopt a particular scale so as to obtain tho length from tho 
vertex to the centre as a finite quantity. Again, if wo pass 
from the ollipse to the parabola by the process of pushing the 
centre to infinity you have, in order to obtain tho finite part of 
the parabola at all, to make the hypothesis that j/ 2 ja 3 is linito. 
In order to get the finite part of tho curve of orror wo slmll 
have to select that part where the ratio of $ 9 to n in finite. 
Then it will he found that we shall obtain tho part of the 
curve that has a definite curvature and a definite sliapo in n 

finite form. Let us assume, then, that — is finito : and lot us 

O-J * 


n 


fly 

substitute for — tlie quantity ■/? with fcho factor 2pq. Tho roaRon 

for that factor will soon be obvious. Take c 2 = 2 ^ 5 ^"n;, fjo that 
x=zc. We then obtain the equation log u s+!0 — log — z 1 } 
when all vanishing terms are neglected. If tho above 

deduction is carefully examined it will be found that all tho 
terms omitted are infinitesimal in comparison with thoao 
retained, when n is infinite. 

Removing logarithms, and writing y for u s+ „, wo have 


y~u s e~ z '-='u s .e 


& 

' cS ‘ 


s a 

2 pqn . 



Wo nro nl, ill at, liborly l,o oIkhmo u, mui,lo for tho ordinaios, 
and il, Ih iiiohL com vmi iotil, l,o oIiooho l.liiii, which lnakoH IhogronloHt 

ovdiiml.o — - , for I, lion Lho aroti bomicloil by t,ho curvo, and 

v V 7T 

the axis of m boeomos unity; then each purl of tbo area roproscmls 
the probability of certain oeourreueos, Cor lho whole curvo 
roprosonls l, which stands for certainly. /Yu alternative is to 
N 

take lho ordinate an . , ho that lho area of tlio curve is N, 

0 V7T 

whom N in llm nnnibor of experiments. Then lho area standing* 
on any pari of lho axis represents lho most probablo number 
of events corresponding to that purl. 

Now lot oh go back and take lho terms wo have so far 
rejected, which involve I -r A, Much of those contains tlio 


p — (/ 

factor, lH convenient to call that quantity 2j, 

for then we shall find that j lias the moaning already assigned 
to it (see p. 2i, and for prooT see p. M), 

He- writing the equation with that notation, and thou 
expanding the part which contains j and neglecting the 
powers of j, we have 
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llis easily soon that jc*= 4(? J — v) — i — 4 — Thooontro 
of gravity of that curve can l)o shown to ho at tlio origin by 
integration. Tlio area of tlio curvo is of course lho integral 
of yd*!*, taken between plus infinity ami minus infinity. The 
part of lho integral which does not contain j is a well-known 
definite integral, which equals unity, It can be seen that lho 
part containing only odd powers of w docs not affect lho 
definite integral. Hence the area is unity. 

Now lot us calculate the error of mean square of tlio curvo 
from the equation, J t is obtained by multiplying lho element 
of area i/jfai by its distance (its) from the ccmtro of gravity, 
and adding up all Lho parts ho obtained, and then dividing by 
the whole area, i,o. } unity. 

It is easily seen that Lho j term does not enter into the 

p J CO I 

romiH, which m MiuvolWo i/jj*. ib= by intogralion by 
jiaRn. Uomjmmig Uuh with ]>. 2J, wo hoo UhiL c, thus 



calculated, is the modulus, as there defined. The third 

moment is j* y,(G*. dx y divided by the area, which is unity , 

integrating by parts wo obtain that the skewness, as defined 
on p, 28, is equal to the j in this equation, Tho constants 
in the equation to the curve of error, as written above, aro 
then the modulus and skewness as defined for curves in 
general. The average deviation, as defined on p. 24-, is found 


r~«o ^ 

by integrating j y.x ,dx+ J y,x,clv } to bo ail d does 

not involve j, 

The equation in its integral form is, if stands for 


area on abscissa from origin to m 






the lower sign being taken when x is negative, 

This does not admit of any simple evaluation, but it has 
been tabulated for a wide range of values of x*. Prom those 
tables it is found that tho "probable error” for tho 
symmetrical curve (where j is zero) is c x .4760, which is written 
pc, For the un symmetrical curve the distances between tho 


2 . 

median and the quartiles can bo shown to bo pc± - jp 3 c| 


wliilo the distance between the centre of gravity and modo is 
jef, and between the centre of gravity and tho median is 

g. jef, as used on pp, 27-29 above, where the resulting numerical 
values are given. 

The effect on the curve of tho j term is to stretch tho 
curve to the right, heaping it on the left at the same time, the 
sort of figure which is indicated in the second diagram on p, 82. 
Actual examples of the curve for different valuos of c and j 
are given on pp. 21, 28. 


The tables give the integral for the argument - , not for 
<t ; and before they can be used the observations must bo 


* See Burgess 9 * Mathematical Stablest Memmatfs Least Squares, n 18G> 
^owhys Elements of Statistics, p. 281, and p, 332 (2nd Edition); and Journal 
of the Jtoyal Statistical Society, 

f See Elements of Statistics, p. 331. Hence, OQ 9 ^OQ, = w lne]i 

gives results on p, 27 and p. 29. 



reduced to tho centre of gravity ns origin and r its unit. 
Thou if wo find in tho I nbl n Mini, the integral function, c.f/., is 
•41)!) when film argument ~ -I-PUH7*, wo nro io understand 
that *455 of tho whole area Ntnmls on tlio ax in of x between 0 
and PHH7 of Mio modulus. Tho tabular statement thou shows 
(Jm various fruotions of the wiiolo observations which may bo 
expected (in an i it (i n i i n number of experiments) (,o Ho between 
tho uinsl. probable value and various values with an assigned 
deviation from tho centre, Tims with tho symmetrical curve 
of error, one-quarter of the observations may be expected tu 
ho above Iho most probable value by not more than ’47 of tho 
modulus, one-third by net morn than ‘OH of the modulus; all 
but 2 per 1000 nro Nopamlod by less than 2'2 of the modulus 
from tho most probable value ; the chance of a deviation of 
5 times the modulus is less than I in a. billion. 

Supposing we are given a not of observations which wo 
have reason to suppose should arise from the distribution 
defined by the symmetrical curve of error, what particular 
curve of error are wo to (it to our observations? Tho problem 
is not very important in itself, but the method of solution is 
very similar to the method which underlies tho principle of 
least squares and of several other formula). Thu only tilings 
which we have a possibility of choosing am tlm abscissa of the 
centre of gravity and the modulus. 

hot tf’i, rC y . . . .r tt be the deviations of tho observations 
measured from their average. Thu separate chances that these 
should arise if the equation of distribution is 

| . 0 fV | Or i<y 

\j ^ o ,,u are / « " a 

r / 7 r c vi r 

where r is given Ntieeossivo videos I, 2 , , . h. 

'Die elmnoo that should oceur together in a given group is, 
by multiplication, 

r. ".vr ‘ n — ^) a ^r a } = I s (nay). 

New on what priuciplo are wo to find out tho values of 
c and k '{ Of all the curves of error from which these 
observations may bo supposed to have arisen there is one curve 
from which they would arise with tho least improbability; to 
find this wo have to make P a maximum, k ami c arc quite 
independent. Then tlm differentials of P with regard to k 
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and c nmst each be zero. The first gives that k is zero* and 
the centre of gravity of the observations is the origin. The 

second shows that -—=■ is the mean square of tlio a’s.t So 

that to choose the normal curve which fits tho observations 
best, in the sense that they would have . arisen from that 
distribution with the least improbability, wo must take for the 
centre of the curve the centre of gravity of the observation, 
and for the modulus the error of the mean square multiplied 


by </2 . 

It will be noticed in the proof that in a souse thoro is 
only one symmetrical curve of error. We can roduco any 
curve to the form y = by suitable choice of scalos for tho 
co-ordinates; but if we are taking two groups moasured in 
the same unit, for instance, both in inches, or shillings, or 
years, then the x axis has concrete units, the unit distance 
stands at one inch, one shilling, one year. And if wo take 
two separate curves both measured in inches, work with the 
same unit of abscissa, and make the areas eaoli unity, wo do 
not get the same maximum ordinate. The finito part of tho 
curve with the lower maximum ordinate stretches further to 
the right and left than the corresponding part of tho other. 
As long as we deal with concrete quantities we shall find 
that the quantity c enters into the shape of tho curvo ; anc! 
the comparison of any two curves is made by means of the 
values of c given in terms of tho unit of abscissa. The quantity 
j is independent of all concrete quantities, and is an absolute 
measure of skewness, as already pointed out. 


% 



= ( -2w&).P — — 2«&.P = 0 when h ia 0. 


f W_( 

be \ 


n , 22(a-fc) 2 ' 

" 4* r“ — ■ 

c c 


).P=0 1 


when c^=22x ,2 -r n , since h ia 0. 


j = *00 



Unit HmUJO mourn 



I'm: 1,000 

DMoiohco 




- 

^ r 

A 


fiom 

nonnril 

CUtVU 

llldhwi 

r 


V(r) 

OulmilaUil 

Auluitl 

mit'ojoni'u 


-a-nriV 



0 

l 

l 

l 

no 

•500 

■flu 

0 

8 

1 2 

] 


00 

a-OKi 

•'I0K 

*511 

2 

2 

0 

- 1 

01 

1 •800 

' 105 

*501) 






10 

8 

- 2 

- H 

02 

1-520 

•■W4 

*400 

22 

20 

— 2 

- 2 

<m 

1-25:1 

•077 

•-KI2 

*471 

40 

80 

- 7 

- 7 

at 

•Ull 

*401 

81 

70 

- 2 

i 2 


or> 

•7<HI 

■mm 

*050 







no 

125 

r « 

4-12 

00 

•'121. 

•22(1 

*2iu 








147 

HO 

1 2 

h 1 

07 

- -HO 

•OH 1, 

■Ofl't 

155 

JflO 

\ U 

1 11 

OS 

t -127 

•071 

*071 








188 

r- 

CM 

i — 1 

-11 

-17 

01) 

Mon 

1 -215 

’200 

111) 

no 

- » 

- 7 

70 

•070 

•»:t2 

*522 

72 

70 

f G 

0 

71 

•055 

-M 1 

•aim 

4,7 

47 

0 

- J 

72 

i-2:n 

'•150 

*4,(2 

25 

28 

( 8 

i 4. 

70 

l-w>7 

•'Ink 

*407 

12 

18 

1 L 

4 2 

71. 

l -7H:t 

•'lilt 

*470 

0 

5 

- 1 

t 1 

70 

2-050 

•‘I0H 

*4H5 

8 

J 

- 2 

- 1 

70 

asms 

•500 

*4SS 

1 

0 

- 1 

0 

77 

2'(lll 

•5(K» 

*480 



— 








04, 

80 


Wo will now iiho tho height-statistics givon on p» 21 us 
an example of the method of (join paring a sot of observations 
witli tho on wo of error, lu tho llrst place wo toko tho contro 
of gravity m tho origin, namely: — 07’5't inches, Tho modulus, 
by llio moLUod of moments is inches, which is thoroforo 
to bo taken as fho unit. Thus 50 inches ih 8*542 inches bolow 
the average, that is, 2*257 times tho modulus. The latter 
mtmbor is entered mnlnr r in tho second column. All tho 
others aro calculated in tho same way. Then turning to tho 



tables and finding what integral corresponds to tho assigned 
values of r, in tho symmetrical curve of error, wo win to thorn 
under the heading ff(r). So wo luivo that botwoon tlio 
average and 59 inches *5 of the wholo curve is obtainod, that 
is to say, one-half; in tho next lino, botwoon avorago and 
60 inches ’498 of the curve is obtained, and so on nil the way 
down to botween the avorago and 76 inches, when again half 
the curve is obtained, correct to tho third decimal place. ~VVo 
should not get the tho true half till wo have gone to infinity, 
but the area of tho curve beyond docs not amount to one por 
mille of the wholo. In this curvo, for example, *462 is tho 
probability that the height of a person ehoson at random lies 
between 67‘54 inches and 63 inchos, for *462 is opposite 
63 inches, The fraction of tho curvo is tho sumo as tho 
probability of the occurrence botwoon tho point given and 
the average. 

Tho next column, called Y(r), is obtainod in a similar 
way from tables including tho term involving j « tho value 
of j is taken to be +*06 for reasons given below. The column 
following under “ calculated ” consists of tho differences of 
the Y (r) column nniltipliod by 1,000; tho numbers so obtained 
aro the numbers to be oxpocted approximately botwoon 50 
and 60 inches, 60 and 61 inches, &c. Tho following column 
"actual” gives tho actual occurrences per 1,000 in tlm kmuo 
limits. The following column gives tiro differences in the 
various groups botwoon tho calculated and actual numbers. 
The greatest divergence is near tho centre, where thoro aro 
12 more than wero calculated. In Lho last column aro given 
the differences if 1 had taken the normal curvo instead of the 
skew curve. It is scon that by taking tho curve as a skew curvo 
the sum of these differences is diminished from 80 por 1.000 
to 64 per 1,000. 

I have now a rather difficult point to tako with reference 
to one of thoso columns. Theoretically, j is ealculaiod by tho 
method of moments, the error of moan cubo; but in practice 
that does not give good results. A single observation a long 
way fiom tho average has a very groat effect on tho mean 
cube So that if in this numbor of 1,935 persona wo had 
me uc ec two persons from a nationality whoro stature was 
very low, or where it was very high, we should liavo instances 
at a long way along the group which would not pvoporlv 
vitiate the comparison of the curve of error, but would liavo a 



vary unfortunate effect upon Urn moan cube. Instead of 
having it- homogeneous group, wo should have a group of 
1,5 YM people from one group and 2 persons from another 
group which would not belong to tins same ourvo. Thovo lias 
boon a grout dual of discussion as to wlmt should bo dono 
with snob abnormal esses. A good way out of tlio difliculty 
is not to calculate j by llio above inotbod at all, but to 
calculate it by an a posteriori method, to ehooso that valuo 
of j which makes tlio mislit least. YVo lm-vo alroady chosen r 
so an to umbo tin* improbability loss. hot us ehooso j by 
some similar tost. Tlio motliod J liavo adopted hero is duo 
partly to Professor Karl Pearson, and partly to Professor 
Mdgo wortli. It is to obtain liguros (not givon lioro) in such a 
form that it ran bo soon what vuluo of j will make tlio sum of 
tlio absolulo dilTonmoos loast. Tho value wliioli satisfies this 
condition is found to bn j=d)(i/ x Tho valuo obtained from tlio 
momonts motliod is -Old, Ttiis might liavo boon used and 
would liavo givon a rosult slightly hottor than tho value j = 0. 
Put I am inclined to say it is hottor to calculate j from tlio 
a posltmari motliod ; I think it is ipiito as logical, and you 
arc hound to got a hotter lit. 

Profossor Karl Pearson has givon a tost by which yon 
run consider tho following problem : — Supposing you had a 
population with certain oliamoLorisLioH, such as height, 
distributed according to a curve with a particular formula, 
required the probability that an assigned distribution would 
ho obtained from tlio supposed distribution. Putting it into 
a more concrete way, suppose the equation of tho height 
group for the whole population was this equation with 
n inches, and j =•()($ : required tho probability that 1,0115 

persons taken at random from tho population would have tlio 
heights actually registered. Profossor Karl Pearson lias 
given a tablet with the necessary liguros for determining that 
probability. Calculation from his taldo on this distribution 
shows Unit if wo take tlio symmetrical curve tho probability 
of obtaining such a selection is *4; tint h is to say, the chances 
are two in livo that the 1 ,5)115 persons would not he further 
from tho supposed distribution than they actually arc. If 
wo lake tho skew curve with j = *0(i, the probability is *7; 
that is to say, the odds are seven to throe that wo should 

* Him JiiHVttut of iho I loyal Ht a list loaf tSoowftf, Juno 1002, pp. 887*8. 

f Him fjontlon, b'din* and Jhtblin Phil, July 1000, p. 175. 



obtain 1,935 persons as nearly conforming to this group ns 
we havo found. It is very difficult to arguo back from tho 
height of a person to the expression + and I shall 
not at present attempt it. 1 have shown abovo that wo 
should obtain this formula of the curvo of error if wo wore 
dealing with chances, with ovents whoso occurrence wag, 
by those terms, in the binomial thoorem. But tho same 
equation will be obtained on very many othor suppositions, 
and I have only taken the simplosf, Boforo giving those, 
however, it is necessary to define a (< frequency curvo." 

If we are dealing with a group of moasuromonts which are 
distributed about tlieir average so that tho number of tliom 
which lie at any defined distance from tlieir nvorago, say 
between x and ( t r + d«r) in oxcoss of it, can bo roprosoniod by 
a definite function, sa; y / ( 0 ), of that distance, thou tho curvo 
which represents this function, i.c., y~f{&) } is tho frequency 
curve of that group, If the unit of ordinate is so olio son that 
the whole area contained between the curve, tho ordinates and 


its extremities, and the axis of x, is unity, thon J ydx~ 1 if 


a, and b are the limiting values of 0 ; in many casos a and b 
are + go . Then if the quantity is seloclod at random from 
the group, the probability that it will lie botweon Xi and 


/ a 

y.diV; the probability that it will lie between anil 


+ is ydx. 

If we take the experiment I instanced at tho beginning, 
the tossing* of a coin, and make tho number of times iossoci 
great, the chance of obtaining g'ivon deviations would 
be given by the curve of error, as already shown. This is tho 
frequency curve, for the group of experiments. Events are 
ruled by very different laws of distribution. Wo may have 
a very skew curvo, as, for instance, in tho curves of ages of 
wives 111 Yorkshire whore the mode was a long way to the loft 
of the average; the smooth curve which host fils those 
observations would be the curve of frequency for tho ages 
0 such persons. That is to say, if wo draw this curvo, 
representing as nearly as possible the observed facts, and wo 
make this area equal 1, the area standing on tho part of tlio 
axis between the 35 and 40-year marks would represent 
e c mnee of a person taken at random being between 



35 and 40 years old. If wo were given tho ago of a man 
who had a wife in Yorkshire and we did not know her 
agej that area would represent the chance that lior ago 
would he betweon 35 and 40. The lifo curve, to talco 
another example, is a frequency- curve, To any frequency- 
curve we can assign a modulus calculated from tlio second 
moment. That tells one distinct fact as to the distribution 
about the average. The curve may have the greater part of 
its area to the loft or to the right of the avornge, and 
it may have an asymptote as in the case of the ourvo 
of error; but there is in general only a small fraction 
of the area beyond two or three times the modulus, which 
may therefore be taken as indicating the practical extent of 
the curve. It is often useful to speak of the precision (h), 

instead of the modulus (c), where 7i= - . The greater h is, the 

c 

more ^ precise are the predictions that can bo made as to a 
magnitude taken at random. 

If we are dealing with frequency- curves whoso practical 
range is small and whose modulus is finite, and if wo tako a 
great number of these frequency-curves, or rather if wo have 
to select from a great number of things whoso sizes aro ruled 
by different frequency-curves, for oxample, if we inako up a 
line of a great number of pieces of motal taken from difforonl 
hoaps with different frequency-curves for each heap, it is 
possible to find tlio frequency-curve for the sum of those 
elements, that is for the length of the lino you have made. 1 
will put that in different form with a different illustration. 
Suppose wo are going’ to talco 100 books, and wo can select 
them from 100 different groups of books whose thioknosses 
are bounded within definite rangos and have a different 
modulus which can be assigned, required the breadth of 
100 books put together. The most probable breadth will be 
that obtained by adding tlio averages of tlio 100 different 
gioups. From the terms of the question it is obviously very 
improbable we shall g'efc all the 100 below tho averages of 
their respective groups or all above. Tlio actual breadth will 
havo a frequency- curve of its own about mi average which is 
the sum of tho averages of the groups from which 3^011 soloot, 
Its modulus can he shown to bo tho square root of tho sum of 
the squares of the moduli of the original frequency-curves. 
Thus, to take a special case, if we are going to soloct two 



tilings only which obey normal curves with tho same modulus, 
the modulus for the sum is </2 times tho modulus of oitlioi\ 
The developments from this theory are of groat practical 
importance. 

If we take one sample at random from each of a number 
of these frequency-curves whose moduli aro not vory unequal, 
so that no one curve predominates, and add togothor the 
quantities so obtained, then the quantity obtained oboys tho 
curve of error itself, whether tho original froquouoy-curvos 
were curves of error or not, I caunot givo tho proof lioro ; 
the theorem as I state it is partly duo to Laplace and partly 
due to Professor Edgeworth.* That is ono of tho most general 
statements of the cases in which the curvo of error will aviso ; 
and that conception may properly he appliod to tho conception 
of height and tho causes which detonnino tho persons' hoiglit. 
Ko single cause has very groat influenco compared with others, 
so far as we know, and they all presumably havo measurable 
effects whose frequency- curves aro definite. Thus, wo might 
expect a jmori the frequency- ourve of heights to bo tho curvo 
of error. 

Another illustration is supplied by the grouping 1 of school 
children in a particular grado.t I took ono of tho most 
populous grades in the Report of tho St. Louis Public Schools, 
ILS. A., grouped tho children according to their ages, and fitted 
the curve of error by one of the methods I havo described, 
Ihe curve of error with c=l*68, j = '073, Jits the observations 
closely, If we think of the causes which dotormiuo the 
position of a child in a particular grade or class, L think wo 
shall find that they are akin to those I have supposed in 
my statement as to causes which lead to the asymmetrical 
curve of error. But it would be absurd to go back and try 
to re-value p, q and n } tho quantities on which tho algebraic 
proof of the equation depended. We could find out, of oonrso 
what chances would produce this particular distribution ; but 
they would have no necessary relation to the facts. Tho idoa 
I wish to give is that wo can obtain tho equation of tho curvo 
of error m the form I am using it on a very simple supposition : 
and it can be obtained from many other suppositions which 
cannot be given in lecture work. 

* See Edgeworth, ill London, Edm. and Dullm Phil , Mag,, 1802, p. 420. 
t For the numbers and diagram, Elements of Statistics, 2nd Edition, Append*. 
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OinMMWI'S Mmi wo make a grout ninny imiuHuromonLa of Iho 
Han in quantity by mwerul different methods; and that, as is 
generally Min nwo, the moiiHiiromonlH differ from oaeh oUiot, 
owing l.o im per foe Lion h of inMtrmnonLs, or by Mio numerous 
aeeidontal oirniniHhinooH Mini attend any invol ved. observations, 
hoi uh iiHHimio that Mio nioimuromonlN which could bo made by 
iho Ili’Hlr moilind am grouped iireonling In Mio froqmmoy-eurvn 
I ii 

i/ = t\ V, tli oho by Mio Honnml moiliod uncording U> 
n, / 7 r 

I , A ” 

( 7^s c, and ho on, a dollnito normal curve for each 
r-i v nr 

nioMind, Suppose wo make n numsimnnontH, one of oaoh kind. 
11. in required to find what i s tbo most probable value of iho 
diHlnuon to bo measured. All iho ks’h wo aro dealing with 
am errors in our measurement, hYom the series of partly 
ormnooim monsuromoulH it in required to lind Mio most probable 
value. Thai in iho problem ii in attempted to hoIvo by the 
moiliod of IoiihI; squares. Ah a second question, ii m required 
to determine the precision of tlio result, tliai \h, to state 
Urn probability that it in correct within assigned limits, 

Before going furtlior I must call attention to ono vory 
important point in Mio mumming. In Mio reasoning on which 
tlio mothod of least squares is humid it is assumed that iho 
frequency-curves aro normal curves of error, an written abovo. 


If the frequency- curve is not a normal curve of error tlie 
method breaks down at the first step. That I shall havo to 
return to later. With regard to tho moduli, wo may oithor 
suppose that we know them by some a priori method, as is 
sometimes the case ; or that we know thorn by having' mado 
similar experiments at some other timo, if wo aro dealing 
with a group of height measurements where tho modulus is 
three inches generally; or we may find them from tho 
experiments themsolvos. A useful way is to repeat the measure- 
ment by each method, say, 100 times, and from tho internal 
evidence find out what tho moduli are. Wo assurao that tlio 
moduli aro fixed quantities, quantities which we cannot affect, 
and that they are known or previously determined quantities, 
>•> What is the probability that a certain series of errors should 
result in n observation^? Let . . , . x n be the differences 

from the unknown true valuo which arise from n different 
methods taken in one series; what is tho probability that 
those particular n deviations will occur at once ? Tlio proba- 
bility is obtained by multiplying togoth.Gr tlio probabilities of 
their separate occurrences. The probability of tho error 
occurring, when the modulus is c x> is from its curve of 

1 _jij? 

frequency e «r\ The probability that the n will all 
occur is obtained by multiplying n such quantities together, 
that is, — e 2 c J , Hero the only variables are 

7T 2 - C,C 2 C n 

the x s. Now, that probability will be greatest when tlio 

index of e is greatest, that is when is least. Thus, from 

all the possible values of the unknown true measurement, tho 
system of errors which wo have found would ariso with the 

least improbability when is made tho loast possible. 

That is the statement which is at the basis of tho method of 
least squares. In the particular case, when wo take all tho 
the observations by the same method with the samo curve of 
frequency, so that c is the same for all the observations, tho 
minimal condition is satisfied when the sum of the ai 2 is a 
minimum; and wo have already seen that that sum is made 
least when the unknown value is taken to be the arithmetic 
average of the obtained values. Let me re-state this theorem 



in other word*. Suppose wo start to measure a particular 
object by the same method again and again. Thou, tho 
monsuronmufs wo obtain would come with tho toast improba- 
bility when tho sum of the squares of tho deviations is a 
minimum ; and that condition is satisfied if wo tako the 
arithmetic average of onr measurements to bo tho unknown 
true quantity. This statement is a particular ease of tho 
method of leant squares, 

When wo have grasped that initial principle, tho rost of 
the investigation is only a- matter oT tho differential calculus; 
there is nothing special about if. Wo have to write down all 
the equations Unit connect the quantities we are measuring, 
and then by the ordinary processes of tho differential calculus 
express the conditions that tho sum of tho squares of tho 
evrors shall bo a minimum, and these will give onough 
equations to solve for all our unknowns. 1 will illustrate that 
algebraically by a particular ease. Tako the case with which 
we have already dealt, namely, that in which wo had the agos 
of the wives in Yorkshire. There wo obtained a somewhat 
irregular curve representing the numbors at different agos, 
and we smoothed that curve by putting parabolic curves of 
tho fourth degree through various points; and it will bo 
remembered that wo had l.o change the constants in our 
equation according to tho particular group of five points 
selected. Now let us assume fluit wo have a parabolic 
equation of the third degree in this form, 

if = a y) T aim T T ( V j!, ‘ 

This equation lias four unknowns; wo can thoroforo make it 
pasH through any four assigned points, but wo cannot mako it 
puss through live assigned points, Suppose that wo wish to 
determine an equation of the third degree which will pass near 
the live points, then wo will apply tho niofhod of loast squares to 
that problem, hot the co-ordinates of the actual observations 
be (ai, m,), (.iia, w»)j and so on. hot tho corresponding points 
which wo are to (hid on this particular curve bo (oh* V\)} 
and so on. The j joint (^p/i) will be near, but probably not 
coincident with, the point Tho difloronco botwoon 

mi, the observation, and y iy which would bo givon by the 
cnvvo which we have not yet deformiuod, is tho orror of tho 
observation. We are to determine) tho constants so that tho 
sum of the squares of those errors shall bo least. Writing 



that a little more fully, and substituting for y in tornis of as, 
we have that 

S| ( m, — a 0 — rt]«r, — a 2 4 — «^?) * 
is to be a- minimum. 

In that expression the variables are the four a J s, which 
have to be determined so as to make the expression a 
minimum* Therefore we must differentiate that expression^ 
when it is writton out, with respect to a 0} ci\ } a 2j a 3 , and 
equate these partial differential coefficients to »oro, obtaining’ 
as many equations as we have unknowns. Then wo have to 
solve the equations so obtained. 

After a little simplification the following equations arc 
obtained : 

5 . a 0 + ■ a\ 4- 2o3| 3 . a 2 + . «a “ Sw = 0 

2*»i . «o 4- 2^i 2 . (h + 2a’ 2 s * 02 + Xx i 4 . a 3 — Xmw = 0 
2*ri 2 . « 0 4- Sa?! 3 . (h 4- 2*i 4 . ci 2 4- So?! 6 . a 3 — %mx 2 ™ 0 
. ft 0 + Sa^ 4 . ai 4- 5V . a 2 4- . (h — = 0. 

The chief thing I want to say abut these equations is, 
that they are so complicated, and a solution is so laborious, 
that they must he put out of court for all ordinary 
calculations. If you wish to construct a now tablo which will 
be of some general use, it may be worth while to go through 
fclio solution, hut uot for any single practical piece of work. 
Every one of those separate terms, 5a?j, &c., have to he 
calculated arithmetically, and the equations have to bo solved. 
Even in this simple case we have four equations each containing 
four functions. In Merriman's “ Method of Loast Squares,” 
tlie simplest methods for that evaluation are givon. Many 
terms drop out, and the evaluation is possible ; and in some 
cases we can so choose our origin and take advantage of 
certain points of symmetry in tlie equations, that tho work 
can be simplified. In this particular case a simple solution 
has been given by Professor Darwin** 

Fitting Formula to Observations. 

Before we look for another way, let us consider again 
whether the assumptions on which the abovo method depend 
are justifiable, or will justify the great effort which would ho 

* See Darwin, “On Fallible Measures,” London, Edin. and Dublin Mil. 
Mag. t July 1877 ; used in Elements of Statistics, pp. 256. 257. 



necessary to solve the equations. T think it will be found 
that in gonoral they do not. Tf wo look bade through tho 
argument, it will bo noon Unit tho original assumption is that 
tlio dilToroneo boLwoon m, tho actual number of persons 
observed, and i/ } tlio number obtainod from tho equation, 
belongs to the normal curve of frequency; and so in ovory 
case whore tho method of least squares applies wo have an 
observed in oasu remen t, and wo obtain a theoretical measure- 
ment, and we assume that the difference between tho two 
belongs to a normal curve of frequency. Ho Coro wo can 
make Unit assumption we must verify that the conditions, under 
which tho normal curve of frequency is obtained, arc satisfied. 
We lire not in a position to do that; if wo depend only on Llio 
algebraic proof given above, without investigating the 
deductions of tho equation of the curve of error resting on 
other hypotheses, Hut to my mind Lhoro is no proof yet 
given which does show that tho normal curve of error will ho 
obeyed in tho circumstances J have just nion tinned ; and 
Professor Karl Pearson has shown that in very many in stances 
the normal curve is not obeyed, >So the theory is at any rate 
dlllicull to establish tu yriori, and is not supported by universal 
experience, I think, with all the deference that is due to 
Professor Karl Pearson, that the matter yet wants more 
practical experience before it can bo fully doeidod, It would 
1)0 unsafo in tho present stato of tho argument on tho ouo 
hand to say that the normal curve of frequency may bo 
expected ; or on the otlior hand to say doJiniloly that it is 
not to be expected, because it has not been universally 
found. That is too difficult to deal with at till thoroughly 
boro. Tho reason J have gone so far into it is this: if the 
method of least squares is very difficult to apply, and if it is 
neither supported sufficiently by theory nor by experiment, 
then it seems oxpedicmt to try some other method. A purely 
empirical method would be this: Instead of making tliu sum 
of tho squares of the deviations a minimum, make the sum of 
tho first powers of tho deviations, all reckoned as positive, a 
mini mum, that is to sny, remove tho square outside the 
bracket in the expression on p, 18. J3ut it is nob at all easy 
t ) make that sum a minimum, because all the terms have to 
be taken as positive), and wo do not know until wo have finished 
our work which tonus aro naturally positive or which terms 
are negative. Professor liclgo worth has given a niotliod of 



getting the solution when there are only two unknowns.* 
When there are three unknowns 1 believo there is as yot no 
practical solution. 

Another method, still taking the method of least squares 
as the basis, but avoiding the very complex solution, is to 
choose the coefficients, so that the curve will pass through 
exactly tho four points assigned; and then ro-caloulato tliom, 
so that tho curve shall oxactly pass through four other 
assigned points ; and so continually calculate again and again 
tho coefficients, getting a series of curves. Thou from tho 
various values of the coefficients so found, choose those 
coefficients which appear to give the host results. It is roally 
a makeshift inothod, I think it has been often employed, 
and tho results have been very satisfactory. If, by one mu thud 
or another, you get coefficients which make tho tlioorotioal 
curve pass near tho original curve, it docs not lmittor by 
what process you liavo got them. Such a method as that, L 
think, is in general use for approximating to the population in 
i liter- ceil sal years. I think the Census Offico has never 
published this method ; but as far as I can find out, tho 
method employed is as follows : Supposing cortain points 
represent the population at the various dates at which it is 
exactly enumerated, then if, as a first hypothesis, we assume that 
the population increases in geometric progression between two 
enumerations, wo obtain a simple curve passing from one point 
to the next. Then assume again that from this Cons us to the 
next there is another increase in geometric progression, and we 
find that the two curves never have oxactly the samo constants. 
Then obtain some method for passing from ono curvo to the 
other without a sudden break of curvature, reject tho parts of 
the curves near the Census years, and roplaco thorn by a curvo 
winch gradually passes from one to the other. That is a purely 
empirical method, and I think it is tlio ono adopted. It is in 
some such way as this that we can go to work il* tho motliod of 
least squares is too complicated. 

dhe third method, to which I wish to call attention very 
particularly, proceeds in quite a different way. Wo tabulate 
our observations as before, and write down tlio equation of a 
cuive which is assumed to lit them, with unknown constants ; 
calculate from the observations tho moments — first, second, 


% 

Mag, 


See Edgeworth, “On 
1888 j used in Journal 


11 educing Observations/’ Phil, 
of Uogal Statistical Society, Juno 1002, p, 3-11, 



(Jilnl, fourth (u.s many ns Micro itro unknowiiH)— ulioul Mio 
conko of irmvity, l>y the iiioUicmI used above, tuul culenlato 
(Jio moments I'rmn Mjo assumed curve in terms of the 
unknowns. Kqiialin^ tho moments fomnl from tho observations 
wilJi tho moments found for the assumed curve, wo lmvo those 
equations determining (bo constants. Per example wo may 
( ii Icm the i rml-n turn already discussed, when wo found a skew 
hi two of error I n (if certain observations. Tho general 
{‘(jimlion l,o tho skow nirvo of error being vein, by (Jio Iml]) 
of ( lin integral calculus wo slated t-]io valtievs of the first, 
second, and third moments in tonus of c and j ; wo oquatod 
|,1 k>ho to the moments enlculniod from flto observations, and 
(Juts found rami j. YVo not'd to oaloulnto as many moments 
us flioro tiro unknowns in tho particular equation soloctod. 
|i\jf instance, in Muknluiuds formula thorn urn four unknowns, 
and wo lmvo to Inko four momonls. In (ho normal ourvu of 
H-ror there tiro two unknowns, its oontro and the modulus; 
two moments are therefore suflioioiit to (iml tho normal curve 
of orror by this lost. In I, ho skow ourvo of error, tho quantity 
j Inis (o ho determined in addition. In tho empirical equations 
{►■ivon by Professor Karl Pearson in his well-known paper on 
(Jio measurement of skow groups, wliioli wns published in 1895 
in Mm Proceedings of tlio I loyal Society, there aro four 
unknowns, ami thoroforo in general ho nuodod four moments. 
In tho puraholio interpolations, suoli us \ liavo used in Mioso 
loofiiros, there tiro as nmny unknowns as wo like to take. If 
wo stop at *r’, wo nood four moments. In Professor Pareto's 
ompirioal ot|mifion for l-bo grouping of tho incomes of tlio 
pooplo of a country thorn tiro two unknowns. Tho 

equation is ns follows; // =« * , whore // is tlio number 

of persons in reeoipf of ineomo *r t and A, a tiro constant. 
It is also given in a developed form with ono more 
eonsliNif. It is Hiippnsnd Unit Um index a is nearly the same 
for all countries, while A varies from country to country, 
You could obtain those values by the principle of least 
squares, or by equating moments, This is not the place to 
criticise the equation ; I only give it as tin example ol 
algebraic equation for statistical grouping* Wo see then how 
to obtain Hulliciont equations for tho unknown constants, 
and so wo roiuc naturtilly to tho question ol what is tho 
justification for this method. I think I must rnlW von in 


general, to Professor Karl Pearson's paper for tlic justifications, 
because it is his motliod, and in particular lie lias quite recently 
published a paper in tlie journal Biometrika* going very 
carefully into this whole motliod ; and all I can do is to simply 
follow in his steps. The method deponds on a purely empirical 
basis, not on any a priori theory. By its moans we do, as a 
matter of fact, obtain an equation which fits the observations. 
But, incidentally, Professor Karl Pearson shows that the results 
obtained are, in general, tlie same as those obtained by tlio 
method of least squares. Without basing his system upon 
the coincidence at all, he does obtain tho same results. The 
advantage of the method is, as he has also shown in tho 
same paper, that tho solution of tho equations obtainod is 
very much easier than the solution of equations obtained by 
tho ordinary method of least squares. I hesitate to go further 
into this subject because it is Professor Karl Poavsoids subject, 
and all his papers are very oasily accessible. Tie has shown 
that empirical algebraic formulas can he found for a very 
wide range of groups, and in every case lie has fitted equations 
to the groups by the holp of this number of moments. Ho 1ms 
then found that tho equations so obtained do fit tho groups 
exceedingly well. Groups may, perhaps, contain 30, or 40, 
or 100 measurements, but tho constants at disposal are ouly 
4. If you calculato these 4 constants by any motliod and 
obtain, as a result, the equations which fit a wide range of 
observations, you havo a strong empirical justification for tho 
method. I believe that is tho justification which Professor 
Karl Pearson gives for the mothocl. But wo are met faco to 
faco with this difficult question, which it is impossible to deal 
with here and now ; How far ought we in such investigations 
to take empirical formulae which are ouly justified by tlioir 
results, and how far should wo base our reasoning on a priori 
assumptions as to the nature of error, and as to its occurronce, 
assumptions which underlie the thoory of probability, and 
from such assumptions obtain our equations? Should wo 
obtain our equations with the view to fitting tho result, or 
should we obtain our equations from a priori reasoning and 
see how far they fit tho results ? To my mind wo have not 
nearly enough experience in the mattor at presont. Wo have 
not sufficiently tested the fitting of groups to tlie it, priori 
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equations, nor have wo yet sufficient experience to say that 
the empirical inothod is universally satisfactory because it has 
been found to fit wide ranges of groups. At that point I 
must leavo the discussion. 

Uses of the Curve or limiou. 

Whatever may be tho ultimate decision in the questions 
which I have thus stated, there are certainly many usos for 
tho curvo of error in the form in which I gave it in the last 
lecture, quite independently of the discussion wo have just 
been engaged in. Tn what I have been recently saying I have 
boon following, as far as possible, Professor Karl Pearson's 
method. In what I shall say now I am following Professor 
Edgeworth's work. I do not mean that tho two are 
contradictory in any way ; I wish to indicate that I am 
trying to summarize the presont position of this question on 
tho lines of the two most eminent authorities in this particular 
work. Por clearness, I repeat the method of generating tho 
curvo of error given on p. 43. Suppose we have a number of 
frequency-curves, each of small and limited range, that is to 
say, of great precision, its modulus being small ; lot the 
moduli of n such curves calculated from the squares of the 
deviations ho Ci, r 2 , . , . c n . The curves may bo of any shape, 
oxcopt that no finite part of their areas may bo at a great 
distance from their centres of gravity. Suppose we tako eq 
observations belonging to tho first curve, a% out of tho 
second, and so on, and add them togother; tho curve of 
frequency for tho resulting sum is tho normal curvo of error 
with modulus v/fSafy 9 ). If instead of taking tho sum, we 
tako any other function to which tho sum is tho first 
approximation, tho curvo of frequency for tho values of 
this function is likely to approximate to a normal curvo of 
error; bub wo will hero limit ourselves to tho sum. Tho 
following diagram and tho oxporimenfc on which it depends 
illustrate this theory. I took Chambers' mathematical tables, 
and ohoso throo digits at random and took their average, and 
ropoatod this a thousand times. The curvo of frequency of 
tho 10 natural digits is a straight lino; you aro as likely to 
got any ouo of thorn as any other, if you select a suitable part 
of tho tables. I have represented that curve of frequency 
by ton dots. It is limited at both ends, its modulus is fairly 
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vva* : '(•■(Ml, mid it siijiplioH u vary aovoro tost or tlio 
priiiriplo I Imvn mnrnemtod, Iiocijmno wo luivo n tmrvo of 
iViM|U(Ui<\y wid* ’ll is almoin toly diJfmuit Trout Ltici normal 
ntrvo cd omir ; il> duos not ii'ppi’ii.s iimtlo to it in uny way 
wlm-t-iwor. Tho aolriml | irulm . iiili(.ioH of llm omimmco of 
various n mnl torn nro Win Miecowivo cnoflioinntrt in tlio 
e^xpn iirtimi of (I I m -| .id I . . , 1,000, Comparing fchoso 

with Mm result of tho experiment wo liuvo tlio following 
fable : 
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I(i in \\u( my point here In «ho\v Mint I.Iioho liguroN urowluxt 
you would ok pout to get ; wlmt I wisli lo shew is, firnl, that 
tho sueuesHVO probabilities, when they nro plotted out, 
resemble the eiirve of error; iintl, seetnidly, tbuttbo experiment 
lends In lit o normal eurvo oi error, In Diagram IX tho 
oontiunniiH lino with dels on it is the ire(|uonuy wlrieh you 
would expert. Mite broken lino in tlio eurvo oi error, with the 



same area and modulus; and the crosses aro tlio positions 
obtained from the actual experiment. It is scon that 
though we started with a frequency* curve which was a straight 
line, that the theoretical curve which wo obtainod for tho 
average of only three terms selected from it is already so much 
like a curve of error that you would mistake it for one, if a 
model was not traced on the paper j and that the actual 
experiment supports the same view. 

We note that tho modulus calculated from tho squared 
deviations for the natural digits is 4*08, and that from tho 
formula \Z(%a~c A ) given above the modulus for the sum of 
three digits should be J 3 x (4‘08) a =: 7'032; and for tho average 
of three digits should therefore be 2*344. Tho modulus of 
tlie curve given by the calculated probabilities of tho various 
numbers is 2*345; while that calculated from tho results of tho 
experiment is 2*358. The averages are 4*5 (theoretical) and 
4*494 (experimental).* 

Construction op a Group prom Samples. 

The theory which I have just enunciated; for tho proof of 
which seo tho reference given on page 44; is, that if we start 
with any frequency-curves, and tako our examples from them, 
one from each or many from one, and tako tho average, wo 
shall obtain a curve which becomes moro and more liko tho 
curve of error as we extend the numbor of our examples, 
and as the frequoncy-curvos satisfy moro and moro noarly tho 
limited conditions which are laid down for thorn. Now, that 
is not only a mathematical theory : it has very great practical 
importance. Supposing that wo take a numbor of samp] os 
out of a large group, how near tlie true average may wo 
expect to get? If the curve of frequency of the group was 
a curve of error, we can at onco write down tho probability of 
different divergencies. If wo have a curve of error with 
modulus c, and we select n samples at random from it, and 
then take their average, the modulus for llioir sum is from tlio 
formula already given, \/W, and hence that for their avorago 

^\/n ^ ie precision of the arithmetic average varies inversely 

as the square root of the number of items, a very well-known 
principle. I wish, to show how this theory can he adapted to 

, nl r f &&wovth, in Jubilee Volume of the Journal of tho Royal 
Statistical Society t p, 186. ' 
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Hii|i|»nru> Mm origimil ciirvu of IVmpmtmy |<> bo any n ,rvo 
whnlcwi'r, n mown of survivors for HMtiiipU*, I do not .' uhhhiiio 
a ".y Kii|»|kih(i wo g o through H n 

<'\porimoiit, Inking, wo will sny, v> (‘vamplcs tut mmloin IVom 
il» "" ,l l ' (> P‘‘ nl ' prort'MH k linu M. | |„ ( ,} K , oxporinmiit 
jowl, discnssml id wus only I liroo, mid ft was 1(100.] Though 
l-lio original nil im Ihm'h do not nhoy |,|io nornml om-vo of omn V 
yot Urn uvorngo of m of tlmm tuny ho oxpoctud to, whon m is 
sullioiontly groat. I ml. r ho Mm imocIuIuh for tho group 

nvorugos of m samples; Mmn J imiy ho oxpwtod to ho Iho 

modulus I’m- Mm n.vonign of Mm wlmlo mass of km samples, 
Thus, iu Mm iilmvo oxporinmnt, <• wim 2'!!.% k 1000, mid 

yi - -■(HM* ; flio known uvovngn lor nil digits, wliioh fovmod 

Mm orig'iiml niryo of froipmimy is d. , f>, tho uvorngo for Miu 
d.OOO Holootod, in I ,()()() groups of tln-oo, wns tho 

ililToivimo is oiio-foiitli of Mm modulus just oiiloidutoil ; so small 
ii dilloroimo might ho oxpoHoil oneo in nino triids, 

Thus, wlmlrlmr tho ourvo ol lYmpu'iioy of Mm original group 
is Mm nornml onnm of error or not, Mm prnoision of tho 
average of a grout nnnihor of smnplos is proportiomd to tho 
si pm rev root of Mint nmnhnr. 

Now lot ns ni'o I low lo oonstrncit not inoroly mi average, 
hill, n wlmlo group, hy tho motliod of suniptos. 
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Tim hi, Mo ii ml diii}* mm give Mm result of an experiment in 
Hiirli ronstruelimi. Tim material of the experiment is of no 
imporlnuee Imre ; I merely look the namt neeessiblo figures to 
eomlnet Mm ex perimenl , namely, Urn (MMciul (la/, otto prints of 
wheat lor Mm Odd months for whirli I h t \y are recorded in tlm 
k(j i kislicNi I alntnirfs, and regarded Mini as si group of things 
whit'll I was going to Iniihl up l> y sample. Kor complete 
illusliini lion l had lo In ken, group 1 know, and Mmti to take 
samples of il, In general, of eonrso, Mu* group is not known, 
Init him io ho eonstrimted from flio samples. Tim aetiinl 
group is Hint given in Diagram X in Mm enutiminus linos, To 
ohlnin Mm samples, I look {'hnlnlmrH , umthonmtieul tables, 
mid assigned lo parlirulnr numbers, from 001 lo (100, 
oorfain months, mnl took 100 number, s of Miron dibits at 
random, Next, I wrotn down Mm prions in tlm 100 months 
corresponding to those 1 00 numbers, and grouping thorn in 
I (to. groups, obtained tlm numbers given in tho third oolmnn 
above, mid also given by tho crosses in tlm Diagram X (a). 
I im\t selected Hf> samples by Inking Mm first 25 of tlm 100, 
a, ml I grouped tlm figures in 2(to. groups, and obtained tho 
umnlmrn given in tlm fifth nolunm and liy Mm crosses in 
Diagram X (a), What nilo Imvo wo for doubling lmw noar 
the trim group Mm sample is? In tlm third division, for 
iimhumn, hntwoon !l(to. and 40.v, in Mm whole group, there are 
I ;h|. instances, mul 21 per cent. of the area is between Wto. 
mid d(to. II wo take 100 I hings at random out of tlm whole 
group, lmw many of that HI per rent, are we likely lo got? 
This in it simple problem in probability: if n samples are 
taken, tlm elmneoa that 0, l, 2 . . , a will come IVoni a, given 
part, which is lo the whole as is p to I, are the successive 
eoellieionls of llu* expansion of (tf I p) n } whom r/= I — <p; as 
n increases wo approximate io a curve of Iroqueimy witli 
moduliiM s/tyu/u. ( see p. M), In Mm third division p= <4 2l , wliilo 
//, tho whole number of samples in the first experiment, is KM). 
More y:i/w/w. ■ /(2 x *21 x *70 X IO())=-fi-H. The dilTerenee 
boliweoit tlm actual 1 1 1 m 1 1 1 u t r per KM) in tlm group, namely, 21, 
and Mm number found in tlm sample, namely, Id, is loss Mian 
Mm modulus. In all tbe other eases in both experiments the 
differences are within Mm “ probable emir " (which is M-7 of 
(,|i 0 modulus, see p. dll).' We have thus found it criterion ol 
the divergencies in be expected between the distribalion of 



magnitudes in a group of samples and the distribution in tho 
unknown group from which they arise, 

As regards the precision of the averages of the samples, tho 
modulus for tho original group is about 1919., mid, thereforo, tho 
moduli for the averages of 100 and of 25 samplos, respectively, 
are 19s.-- y/Td5=U. lid., and i9s,-f* \/25 = 3s. lOd. Tho 
averages found from tho samplos are actually 45s. 4 <1, and 
4Gs. Gd. which are, respectively, Is. 7d. and 2s. Pd. in oxcess 
of the average of the whole group. 

The experiment, therefore, forms a good illustration of tlio 
theory, and on consideration it will, I think, bo found that 
the theory is in strict accordance with common-sense and 
common experience. 



MEANUREM ENT OE GROUPS. 


I'MKTII MGOTUWG. 


(JoKIfN RATION IUGTWRHN Two (iROURS. 

IjUT thorn Im )i ] mi rs of measurements («r 2 i/ 2 ) and ho 

on up l.o (ir, ,?/,*), Mm moiuhoi'M of (Midi pair having soino 
determinate commotion with each other; lor example, suppose 
Uuili Min in’s ii.ni Min ages of Min wives in tho group taken 
ahovn, and tlio i/*h Mm ages of Mioir husbands, ii> and y v boing 
Min ages of a married nonpln. This in Mio oxamplo dismissed 
below. Or nil p | >080 Mint ir>. in Mm ago at which a man dies, 
inul y r Min ago at which IiIh fathur diod; or suppose that «r r , y y 
ii ro luniiHiinmumtH of phymcuvl characteristics of Mm sumo man. 
Or again, .*> wight I in a doatli ruin, in a year in which y y was 
fhn average temperature. It is rnijuirod to measure tho 
relationship bntwnnn w*n and j/h ho an to answer this question : 
(Jivnn ouo of thn ir's, assign tho probable value of tho 
corresponding ;/. hor oxamplo, given tho ago at which a man 
diod, assign tho mowt probable ago to wliidi his son will live. 
Or, talcing mm monibor of tho group of wivuk at random, slato 
tho probabilities of tho ago of her husband* Wo have in fact 
to give numerical expression to Hitch statements as these : 
A high death rate geos with a low temperature; along-lived 
father has long-livod sons; for two statements where 
two meiiHurcablo quantities are connected in that way, whore 
in common parlance we connect thorn with simple adjcctivos, 
wo have to find a numerical or mathematical expression for 
tho relationship* hirst suppose that Micro is no causal 



connection between two groups. Then if wo soloct any 
particular place on the axis on which tho aro measured, 
and mark in the corresponding* i/s, wo shall get a group of y*H 
whose average is equally likely to bo abovo or bolow tho 
average of all tho y J s. Suppose we clioose a group of wives, 
between the ages 25 and 30, mark in tho ages of their 
husbands, ancl mark the average of such ages, if thoro is no 
connection between the agos of tho one group and the ago a of 
tho other, the average of the group so takon will ho near or 
equal to the average of tho whole group of husbands, namely, 
42 years. And so, if wo take another period and mark in tho 
various ages of the husbands wo should again find tho avorago 
near the average of the whole group. If tho aj’a aro 
represented on a horizontal axis, and tho i/h aro measured 
vertically by points placed abovo tho values of x which aro 
their pairs, then if there is no causal connection between tho 
magnitude of the t r J s and of tho if s, tho averages of groups of 
the y\s corresponding to assigned intervals on tho axis of x 
will all lie near tho horizontal lino through tho averages of 
the if s. They will not lie on it, hut tho best straight lino we 
can draw near these points will bo a horizontal lino through 
the average \ that is obvious as soon as the statement is 
understood. 

But now suppose there is a causal connection betwoon tho 
two sets of measurements; suppose, for example, that a high 
value of x goes with a high value of y . Tlion if wo start from 
tho average value of x, which wo may assume for the moment 
corresponds to tho average value of ?y, ancl pass to the right 
and clioose a group at a jilaco abovo tho avorago for tho 
the y'a which are obtained for that group will bo distributed 
about an avorago abovo tho lino. And as wo continually 
mark off the averages for group after group by points, they 
will lie on some curvo which tend ship ward to tlio right from 
tlio origin and downwards to tho loft. (See for example 
Diagram XI.) If, on tlio other hand, a high value of x wont 
with a low valiio of ?/, there is a change of sign; tho sorios of 
averages would go clown to the right and up to tho loft, Tlio 
ox act method of drawing a lino through tho so points I do not 
propose to discuss very minutely, W"e could draw a smooth 
line by tho methods discussed in tho first lecture, or a 
freehand curve. We can either draw a straight lino as noar 
us possible to tho dots, or wo can draw a curvo. I shall 



noli discuss Mid mmrii 1 hIiii|mi of Mint curve; I shall merely 
assume I luitj from Miu observation nr otherwise, wo cun draw 
Mint cum?. A i h I since in any scries of observations the 
particular averages urn linliln l.n slight displacement, in a 
iiitil.n number of ohsemilions wo do uni get I lit' most probable 
point with each average, mill must smonl h | ho I inn in Mm way 
wo lin.vn discussed. YVe may assume an equation, y=/ l (ir), 
whioli gives the average of the i/s for Mm particular values of 

Mint is only giving a general form to the statement, Mint a 
value of // in connected with a vuluo of ,»• by a dotormiimto 
equation. 

This equation, uf course, only five's Mm position of Mm 
averages of the selected groups of y’s. Kvoryono of Mioho 
groups has its own frequency-curve, If wo select again 1,1m 
ages of Mm husbands of those wives whoso ages nro between 
1*5 and Ml), wo ran draw a frequency-curve lor Mint group of 
husbands, but Mm contra of Umh frequency-curve will no 
longer I m at Mm average ago of all Mm husbands, if tlmm is 
causal connection between the groups; but as tlio group Lukoil 
is holow Mm average of the wives, Mm centre of this curve 
will be below the average for the husbands. It is not 
necessary, in general, to make any attempt to draw this 
frequency-curve point by point, but only to talco its centre 
and in some eases its modulus. Instead of dealing with 
arithmetic averages, we may equally well use the medians of 
the groups. 

Wo might take, for oMunplo, such a question as this, a vary 
old question: Mas the price of wheat anything to do with the 
marriage min 't In such a ease as that wo plot out the prices 
of wheat in different months or years along the axis oJ“ *r, 
and put in ordinates showing the average nmrriugo rate when 
the wheat was that particular price, and thodireetion of this line 
or the form of this curve would give, within certain limits 
dealt with holow, the answer to this question, whether Micro 
was a commotion between the two or not. If we do obtain 
from our observations that Micro is a tendency upwards to 
the right and downwards to the left, or vim vtmu, we have 
found that there is something common in the system of 
causation which produces the two sets of phenomena. Wo 
cannot say that the m’s are the cause el the // s, nor viva 
wirwt, hut only Mutt Mm two phenomena are net absolutely 
independent. 



Tiie Coefficient of Correlation. 


We have to find a nuinorical moasuro of that dependence. 
If the curve that we obtain is a straight lino, we havo only to 
find a means of calculating its inclination. Bo Coro proceeding* 
to this, lot us spend a fow words on tlio caso when tho curve 
is not a straight line. Suppose that wo have sufficient 
observations to determino by experiment and observation tho 
actual shape of this curve from largo groups, wo could, 
without applying any further theory whatever, establish tho 
connection between tho a>*s and the y } &' 3 tho eurvo can bo 
plotted out, and given algebraic expression, if possible ; and 
then we should bo able to say that Cor a particular value of 
tho most probable value of y was the ono obtained on this 
curve. We could have a curve simply from oxporionco, and 
use the experience with similar phenomena at another iimo. 
For instance, if wo had that oxporionco of tho length of tho 
lives of the children of parents who livod to various ages, avq 
should bo able from this empirical eurvo, to say if a maids 
father lived to a certain ago thou tho chances of tho life of 
the son are given by a frequency-curve whoso oontro was 
found from the empirical diagram, and whoso shapo might 
very likely be known also. In many casus, however, tho 
curve of averages is approximately a straight lino. Even if 
the approximation is nob very exact, ifc may bo useful to 
calculate the inclination of the straight line that passes 
nearest the averages. Let us suppose that wo have tho 
equation of this line, y=zax + b. Consider any observation 
av, y v \ if this observation lay exactly on that line, y r would bo 
av v +b. If the observation does not lie on tho lino, its 
distance from it, measured parallel to tlio axis of y, is 
y, — (a^V+6). To obtain tlio best values for a and h } which 
arc the only unknown quantities, wo can proceed* by Llic 
method of least squares, and make tbo sum of tho squares 
of such quantities as y r — (otv+b) a minimum. Then tlio 
differentials of X[y y — b) 2 =u (say) with regard to both 
a and b must be zero. 

$11 

Thus g- = 2a2a> 2 - 2 Xxy + 2b%x= 0, 

$u 

= 2nb 2a£a j — 2Xy = 0. 

* Seo below p. 73, 



(.11 loose the ii. \ns ho Mini hoMi the iris and the i/s are 

measured from Mmir averages, Mion — () = an cl the 

v ir;/ 

equations give iih i> -0 and «■= y ‘ j ; Mio lino required passes 

V aJ » 

through Min origin, mid its equation is y = y* y • *n, Lot (7,,(r. 2 

«Wrl< 

ho Mio standard deviations of Mm groups of m’s and of ifa so 
Mini- no naf 

equation becomes 


: and 


lull r™ : Uion the above 
u<r\<r* A 


;//— • *i»= 

Mr>j J 


w *.. 


Mi alt is, ^ sr r « — M’ 

o*a < r i 

hi order l.o make r symmetrical, ii, lias been necessary to 
divide by tr { and o\ u Mini is, to measure ir and \j by Mmir 
standard devild oms. It is a very natural tiling to do. Before 
wo ran got any nmnorirnl comparison, wo niiist reduce tlioni 
to some eoinmoii measure, and a common unit which wo can 
vory reasonably adopt is Mm standard deviation for each of the 
two tilings. If wo nro dealing with the question I suggested 
just now -tlio marriage ru to and the price of wheat — wo 
cannot compare shillings with a rate per thousand, but we can 
compare it. ratio of the number of shillings to a standard 
number of shillings, with the ratio of the rate par thousand to 
a standard rate per thousand. Wo are then comparing 
absolute instead of concroto quantities. Wo should got 
similar equations if wo used the modulus instead of the 
standard deviation, or the probable errors, or tlio mean 
deviations. b\)r rapid work wo could replace Mio <r, and <r a 
by Mio probable errors, wliioli aro proportional to the standard 
deviations in curves which approximate to tlio curves of error. 
It is to bo noticed Mint we can express Mio qn anti ty r in tlio 
following form : r is the average of such products as 

r is called Mio cooffleionl of emulation* Tt is nob 

cr, tf.j 

diilieult to show by pure algebra that the quantity r so 
determined must lie between + I and --*11'; and lliafcr equals 
-|- 1, only if the ratio of every ir to its corresponding y is 

# Tilt! IitHtf fow piiuigi'iiplifl lives mi bnl initially tlio biuuo as tlioso given by 
Mr, Yulo iu lh« Journal of the Uoyol Hlaliatical Hooioly, 1897, p. 817 «i'q. 

b Sim Klemmh of SlntiHioe t ])■ SI9. 



identically the same as the ratio of every other x to its 
corresponding y, so that the ratio y :x is constant, and equal to 

- 2 , If the ratio is constant and — — ~ , r becomes — 1, 

o*i 0*1 

and an increase of x corresponds to a diminution in y. Thus r 
is always between + 1 and — l , and botwcon these limits 
there is a scale of correlation. For instance, we can say that 
the correlation betweon two sets of phenomena is '6 or “-'3. 
Of course, when one is first introduced to a new scalo of any 
sort the numbers in the scale convoy no meaning; it is a 
matter of experience to attach the right value to the different 
magnitudes in the scale. Perfect correlation can bo understood 
from the statement that groups aro perfectly correlated if a 
deviation of a membor of one always equals tho deviation from 
the average of the corresponding membor of the other 
multiplied by an assigned constant. If the two things, 
marriage and wheat prices, wore perfectly (negatively) 
correlated, you would he ablo to establish some such equation 
as this : An increase of '1 in the marriage rato is always 
found with a diminution of 6d. in the price of wlioat. Of 
course, such a rigid relation is never obtained unless thero is 
some physical cause binding tho two things together. As tho 
ratio of corresponding pairs tends to constancy, tho correlation 
becomes move and more perfect. That must bo regarded as a 
definition of correlation. 

Now consider the sum of tho products of x and y, and 
let us write X for - , and Y for . 

CT\ 0 * 2 

If there were no correlation, if wo seloctod the valuos of 
Y which corresponded with a particular small range of 
values of X, we should bo likely to find a negativo valuo to 
neutralize each positivo value of Y, and the products arising 
from that range of X's would tend to zoro, and tho greater 
the number of terms the less the distance of tlieir average 
from zero. But directly there is any bias towards getting 
the positive value of Y for this particular range of X's, as 
we increase the terms we may still get negative terms lioro 
and there, but on the whole we shall got positive terms; and 
so on, all the way up tho scale of X J s. When there is 
correlation it is clear that the sum of the products tends to be 
greater than where there is none. Thus it sooms probable 



from first principles that tlio quantity r thus calculated will 
limko a good measure of correlation. 

Tlioro is an important caution to be given in the use of 
this formula. If, from two series of phenomena which were 
absolutely unconnected, we took a limited number of examples, 
say a thousand, and worked out the value of r, we should not 
obtain exactly zero, or rather the chances are very much 
against obtaining exactly zero, even if there was no correlation; 
and. if wo took a very small number of examples the chances 
nro vory much against obtaining anything near zero. As we 
increase the number of samples, if there is no correlation, 
tlio coefficient will tond more and moro nearly to zero. What 
wo require before we can use the coefficient is some criterion 
to enablo us to know wlietlier the formula is significant, 
or whether the actual number might have arisen if there had 
boon no correlation whatever. Such a criterion is given below 
on p, 88. 
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The numbers given are in oveiy case tlio nearest thousands. 


A numerical example I have prepared will put the 
calculation in a clearer light. The table here given shows 
tlio numbers (to the nearest thousand) o£ the wives and. 
husbands in the County of York in 1901 at various ages. 


grouped in periods of five years. For example; if yc>u in c 
the husbands 5 ages from 40 to 45 and look along the lino >o n 
-will see that there aro two wives between 25 and 30; bwen 
between 30 and 35, 25 between 35 and 40, and so on. * 
was not practicable to deal with 010,000 cases, and I uwt 
therefore dealt with the thousands only, and approximate* 
throughout the calculation. The fact that iho nnmbeis 1,111 
diagonally down the table as they do shows at once tliei o is 
correlation. I have taken a caso where tlio correlation is ncai j 
perfect. If we had a table where the correlation was voiy sum . 
we should find the numbers distributed in random fashion ft ^ 
over the table. In such a list of figures as this it is not veij 
practicable to take the arithmetic average. It is easfoi am 
as accurate to take the medians. I have approximated to t ^ 
medians for all the groups, both horizontal and vertical, 3 
the methods already explained. To take a particular examp o, 
consider again the husbands who are between 40 and 45 ycais 
of age. If you look along the list you will find there aro in- 
all 78, and that tlio median age is 40*7 years. Or if you take 
a vertical column, if you choose those wives who arc botwcon 
40 and 45, and look vertically downwards, you will find that 
there arc in all 77 of them, and that the modian age of thou 
husbands was 43 '9. 

The diagrams show the medians graphically. In tlio first 
the ages of wives are measured horizontally, those of husbands 
vertically. Above the middle point of each five-year period 
is placed a dot indicating the median age of husbands whoso 
wives* ages come in that period. Thus, looking upwards from 
the position of 37£ years, the middle age of tlio group of wives 
between 35 and 40, you will find that the dot indicating’ th e 
median age of the husbands is placed at the 38*9 years. You 
see that the points so obtained lie very nearly in a straight 
fine. At the top and at the bottom the line becomes a 
little bit curved, for the influences of the lower and upper 
limits of ages make themselves felt. If we tried to got a 
normal distribution for the group of wives who are under 20 
we should be getting husbands at 13 and 14 years of ago, and 
nt the other end of the scales we should have got husbands, 
at ages at which there are no people alive. The fact that tlio 
scale is limited at both ends is the cause of the deflection of 
that curve from the straight line. We have now to measure 
the inclination of that line, In the case I have taken, where 



DIAGRAM XI. DlAGlU.lt XII. 

Showing eomlation between ages of husbands and their wives. 

Median ages of husbands. Me than ages of \\ ive 



Ages of wives. Ages of husbands, 

angent of the inclination of the line through the dots is '97 (nearly) The tangent of the inclination of the line through the dots *92 (nearly; 



tlie correlation is so perfect, there is no difficulty in measuring 
the line, bo cause if a straight line is drawn through throe or 
four of fchoso points it passes very near the others. Hut in 
other cases, it is not so obvious which straight lino is to be 
drawn ; and then we can proceed by the method of least 
squares already taken, or yon can proceed by the following 
practical method which yields good results ; — Mark out two 
lines horizontally and vertically through tho avorages of the 
two groups, and rotate a ruler through tlioir point of 
intersection until tho same nnmbor of dots is found on tho ono 
sido of it as on tho other. It will bo found that that method 
gives a definite position of the lino which passes vory near Lho 
points ; it is a purely empirical way ; but as tho coefficient 
of correlation noed generally not be calculated with 
great minuteness, it will in general be sufficiently corroct. 
It is often absurd in cases of probability to work out 
the results with very great accuracy. Tho lino is not 
drawn in the diagram abovo, becauso it would have 
obscured the dots ; but underneath is given tho tangent 
of the inclination to the horizontal of tho lino which would 
satisfy the conditions, the tangent of this angle is *97. 
Tho second diagram is constructed in a similar way, for [lie 
median ages of wives, whoso husbands aro in a given 
group; the tangent of the inclination of tho lino through 
tho points is now *92. Tho avorago ago of tho husbands is 
42T6 years, with standard deviation 12*6 years; the avorago 
ago of the wives 40-11 years, with standard deviation 
12*1 years. Tho statement wo have now obtained 
is of this sort: — If we are dealing with a man whoso ago 
is h , in excess of the average, and wo wish to know 
the age of his wife ; the valuo of w in tho equation 
h— 42-10 = -92(w~-40‘ll) is nearly the most probable value of 
her age. That comes at once from tho geometry of tho second 
diagram. From the first diagram we obtain similarly : — 
Given tho age of a woman as boing io } so that tho deviation 
from the average is w-- 40*11, then the median ago of the 
husband group is 42*24+ ’97(™-4(Ml), Wo shall probably 
also need to know the curve of frequency for oach of thoso 
gioups, Unless there is a reason to tho contrary, J think in 
general that we may assume that tho curve of frequency for a 
selected group is similar to the curve of frequency for tho 
whole group from which it was selocted. So that we can 



calculate the standard of deviation for this particular curve of 
Frequency when you know the standard of deviation for the 
whole group. That is to say, wo can ascertain tho olianco 
that tho ago of tho husband of a particular woman is 
any assigned number of yours above or below the ago hero 
selected. In the little table given abovo we can actually find 
these small curves of frequency ; for instance, in tho ages of 
wives 30 to 35 years of ago the curve of frequency for the 
husbands goes as follows : — 9, 49, 29, 7, 2, 1. 

Tho above is tho graphic way of working out tho question. 
Wo have now to show its relation to tho formula for r, tho 
coefficient of correlation. Tho quantity r<r 2 /<r, in the equation 
D = rcr^/di is tlio quantity evaluated by the diagram as *97, 
that is tho tangent of tlio inclination of the lino to tho axis of x, 
wlioro ages of husbands and wives are measured on tho axes 
of x and y respectively, and cr,, cr 2 are the standard deviations 
for wivos and husbands, If I had reduced all tho measure- 
ments to tho standard of deviation boforoliand, r, tho coefficient 
of correlation, would have boon tho tangent of tlio inclination 
of tlio lino. Tlio question which is tho oasier to work, decides 
which of tho two methods you adopt. If you work it as 
I liavo done, with tho samo scale of years vertically and 

horizontally, you would liavo to say that r = *97 x ™ , and 

<r 2 

from tlio lower diagram that r = '92x — ; whonco r is tlio 

cr ] 

goomotrical moan botwoon *97 and ‘92, between tlio tangonts 
of tho inclinations of the line calculated on tho two diftoront 

liypotliosos, namoly, ’945; and ^ ='974% 

Now lot us procood to calculate r by tho formula - . 

1 * tUTM 

It is, of course, a long business, and f shall not givo tho work 
completely; I shall only indicate tlio way in which it was done. 
Tlio problem was to find that product for 010,000 pairs, which, 
of course, is a prohibitive piece of work, and cannot bo done 
accurately, because tlio ages are not given oxcopt in 5 yearly 
limits. Wo procood by approximation, hirst of all, I noglcct 
all the numbers below 1,000; socondly, T nssumo that tlio 
numbors loft are at the middle of their respective groups. 
Thou I deal with the 60 or 70 numbers in the tablo on p. 67 in 
the following way. Select a group, c.g., wives whoso ages are 



25 io 30; tli o midrllo, 27 Jr, is 12*0 years Mow Mm average ago 
of all wivos; express this an cl othor deviations in torniH el Mm 
standard deviation, 12*12; L2’(>-i-l2*l2=r()4. That is Mm i/ 
term to bo applied throughout tins group ol husbands,. Mo 
through a similar process for tho jr/s, This fl is at tlm miilclln 
of the group 20—25 years, namoly, 22^, which is I9jj[ yours 
bolow the ayorago ago of all husbands, and that in terms ol 
tlio standard deviations is about I'O ; work out Mm other 
deviations, which five in arithmetic progression, in the 
same way. Thou multiply the numbers in the group, h, 49, 31, 
7, 2, 1 each by its deviation, add, and multiply Mm sum by the 
deviation 1*04 for tlio group. 

lOxAMr T/IS ; 
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Some of tlio resulting Levins will bo negative unless tlio 
correlation is considerable. Add those terms, and divide the 
sum by n (in this caso 002), and the cooJTiciont of correlation 
*96 is obtained. In tlm method X suggest using wo do not 
doal with any large numbers at all. The number is not far 
from the geometric mean of *945 found by the graphic method 

above. Also—- =*00 (instead of *97 ns reckoned above), 
r x ^ =*92 (the same as above), r X = 1*0 (instead of *97). 

Justification of the Formula foh r. 

In tho method of finding* the formula for r on pngo 04, wo 
used the method of loast squares without examining its 
suitability. I will now give reasons, which have not, so far 



iLH l know, boon previously offered, in favour oC this motlxocl. 
IE tho lif?nrt'H wo ii.ro dealing with belong to the normal curve) 
of error, llinro is no dillioulty. If their curve of frequency 
lms any other form, still tho averages of solootod groups, 
represented by tlio (lots in Diagrams XI and XLJ, aro governod 
by normal curves of om>r (woo page 50)* Let y lt y 8 , , , , y m> 
bo tho averages of groups of ;/V, containing respectively 
frj, h> ■ - • A ? «r items, whoso ai values aro ,r,, ir a , . . . x m -, so that 
tho /c, wliieh, in (he grouping of i» J « adopted, aro in a small 
group whoso 0011 tro is <r,., liavo //-pairs whoso average is y )U 
hot //=«»»+ /j be a lino which contains tho values of // from 
which tho observed //„ y a , &c,, aro deviations. ' Tlion 
( //^ — rtoq. — /^) iH a quantity whoso froquoncy-curvo is 

I £ ,J 

W ^ mV ° ^ 1U ,ll0 ^ u ^ lH inversely proportional 

to */k n /.> being the number in the «>, y r group. The 
probability of such deviations occurring togothov is (as on 
page 40) a maximum, when XA?,. (y/ — an? r — ii) tt is a minimum. 
Equating the jmrtial dilTorontials of this sum with reference 
to a and b to ya\ ro, and romemboring that = 0 = «v, 

if the deviations aro measurod from the goneral avorago, wo 
liavo, as on page 05, 5 — 0, and ^lc r (//uq< a — =0. Jlonco, 

Shrify where tho summation oxLouds 

over all tho pairs. Tlion, as before, r= a(Tl = — . 

O'* U<T\ cr 2 

Consideration of tho nature nr the formula will, I think, load 
to tho conclusion, that tho eoollioiont of correlation calculated 
by tho formula is a good mensuremont of correlation, whatever 
curves of frequency you urn dealing with ; and it is surprising 
bow very rapidly a. smalt ox tent of correlation makes itself 
foil, oven when you deal with quite a few oxamplos. If n is 
only 20, you will soon find whether there is correlation or 
not by this formula. If you select groups whore Lhuro is no 
correlation tho orilnrinn, discussed below, shows Hint tho 
correlation is not significant ; but directly there is likely to bo 
correlation between i lm groups, this formula for r shows it. 
'Plio eoollioiont of correlation can be used then in a very large 
region of cusns in which it is required to Lest the connection 
between two series of phenomena. In particular, it can bo 
used to decide whether two series of phenomena are ontiroly 
imcoiiuectod or not, which subject necessitates a preliminary 
treatment of tho nature of series. 



MEASUREMENT OF SERIES. 


SIXTH LECTURE. 


Series. 

I PROPOSE to dofil in this lecture, first of all, with sories 
in general; and then with the comparison of and correlation 
befcwoon two series. By a series I understand a list of 
numerical events recorded at rogular intervals, for oxamplo, 
recorded once every year. In representing a sories by a 
diagram we measure time on the horizontal axis, and 
dividing it up into years, we erect an ordinate at the point 
corresponding to each year, to represent on a suitable scale 
the magnitude at that particular year. The question wliotlier 
wo should represent those magnitudes by dots or lines or 
rectangles is important, but it is decided on tlio principles 
discussed when we wore dealing with the representations of 
groups, and we need spend no more time on the analysis now. 
Perhaps the most natural way of representing* such series is to 
erect a sories of rectangles whose areas are proportional to 
the successive magnitudes ; but if we leave tlio diagram in 
that form it will not be very clear, it will be very ugly, and 
certainly this is not a neat way of finishing the representation. 
Tlio next step is to draw a continuous line to replace the 
rectangles; the commonest way of doing this is, to mark the 
middle points of the tops of the roctangles, and join those 
points by straight lines ; but this method is erroneous, for tho 
same reason that it was erroneous in tho representation of a 
group. Wo need to draw a continuous line so that the aroas 



f.oui.i.i'mcl in Hut rectangles in the first place, sunl by ilio 
curved ImponiiimH in 1,1m Humid, hIulII Ik, o<|unl in every case; 
Iml, this more comiH curve is practically coincident with tho 
omniciittiH Hl.mij.-hl, linns j it makes very littlo difforonco in 
practice which of the two we draw; they give certainly the 
Hiuno optical impression, ||‘, however, wo do roplnco tdio 
reotanglea by a contiimoUH line wo aro making- an assumption 
which is MoinotimoH justified, Imt sometimes not ; by the fact 
ol' drawing a cciitinuoiis lino wo give tho improssion that the 
ovonl, roproHcnliod is continually taking place. This is oorroet 
in I'oproHoutiitiims ol' births, deaths, and marriages, and it is 
partly con-net in representing imports and exports by curves 
Imt it in not onrrool ill the roprosontation ol’ events which 
only occur onco each year. Those are details which aro easily 
analysed. 

OIjAHHIKIOA'I'ION, 


The scricH, nr tho curvoH which represent thorn, can he 
divided into throe main classes: periodic curves, symptomatie 
curves, and others; or instead ol' "others, ,J wo may say curves 
with random lluetuatioiis. Periodic curves are those whoro 
Hiiuilnr lluctuatioim recur at cijual intorvals ol time, as tho 
annual lluotmitiun of temperature recorded month by month. 
Nyiuptni untie curves aro those which have a definite tendency up 
or down, a " symptom,” though short periods may obsenro it, 
as tho death rate since 1870. A curve, which is neither 
periodic not symptomatic, may often he regarded as having 
random llucl.natiniiH about a stationary average, us a curve 
representing tho annual averages of any meteorological 
phoncmimm, such as average temperature year by year. In 
I, lie Diagram X 1 1 1 x nil four curves are symptomatic; the first 
three arc downwards, and I, ho last upwards for the first 30 
years and then nearly level. Tho series represented in 
Diagram XIV lias apparently random (luctualions. Those 
curves arc not periodic in any strict sense. 


I’huioiuo Guhvks, 

The first tiling to discuss is, how to disentangle the period 
from tho symptom when a periodic curvo is also symptomatic, 
or how to moiumre the period if the curve is not symptomatic. 
There is not space to discuss tho matter completely, and I want 
rather to indicate the methods, and leave thoir consideration 

* Sl'O ]). 8J » 



to the reader. A curvo often suggosts two things : firsts that there 
is a regular period, and, secondly, that thoro is a movement 
apart from the poriod. Assuino that wo aro dealing with 
monthly observations and an annual period. 'IV) obtain the 
movement apart from the poriod, tako the averages of the 
12 months of each year and mark thorn on tho diagram; 
these points would show tho averago rate for tho year, when 
the readings of the vertical scale have been adjusted. 13ut 
there is something arbitrary in beginning tho year at the 1st 
of January. The deaths, births, and marriages, and any other 
figures we deal with are probably indopondont of that 
particular beginning of the year, and if we make comparisons it 
may be better to take other periods to start with ; for instauco, 
tho fiscal year begins on April the 5tli, We want a continuous 
representation, which we can obtain as follows : — First tako 
the average from January 1st to Decembor 31st. Then tho 
average from the 1st of February to January 31st, and so on 
until we got 12 dots every year. It is clear that the curve 
through these points cannot have any sudden fluctuations; the 
curve so obtained shows the symptom when the poriod is 
eliminated, The theory underlying this method is quite 
simple. If we take any particular 12 months, wo shall 
include the whole influence of tho period, tho excess in one 
part and the defect in another, and if wo averago 
them wo shall probably get the numbor which would 
have occurred if there had boon no poriod, and if 
the flow had been regular. It is approximate only, because 
the various small fluctuations will affect the averago, and it 
can be improved by smoothing the curve. If the series is not 
symptomatic the resulting smooth cur ye should be a horizontal 
straight line. 

Now, in order to measure the period as apart from the 
symptom, the only method is to write down tho rates for tho 
50 Januaries which we may be dealing with, and tako tho 
arithmetical average, the mode, or tho median of those; to 
repeat the process with the Februaries, and so on ; and tlum 
to represent the successive averagos for the 12 months by a 
separate curve, which is best drawn with a base lino through 
the general average of all the data. Wo thus got such a 
curve as that given by the graph of j/ = sin from 0° to 360°. 
The justification of the method is simple. In tho 50 Januaries 
we include one January from oach part in the symptomatic 



ourvo. All tho exeossos duo to Uio symptomatic tondonoy will 
lm (munfor-lmJnnrod by Mm defects, or will loud to bn counter- 
balanced l)y the defects, dun also to symptomatic tondonoy. 
They will only tend Id bo <um nlior-balunrod ; for if wo take 
i ho 51) fJ11.111111.noH wo include among thorn Homo extraordinary 
months, and Homo months wIioho deviation from tho annual 
average in i|uil.o small, Tho accuracy with which wo may 
nxpoH. to got tho tnm January Tomlin# is proportional to tho 
m< | na.ro root of tho number of times taken, from tho theory of 
averages discussed above. In carrying out tho method, wo 
implicitly uhhuiuo that tho causes which decide tho symptom 
and tho onuses which decide tho period are independent, 
while generally they are not independent. If thorn is an 
increasing death rule or an increasing want of employment 
at the Hinno time that the winter is especially severe tho one 
will mmen tun to tho other, It is very easy to see how tho 
result may be a Ifoeled. Hupposo some industrial disaster 
throws a groat proportion out of work in August in one year, 
ho as to increase the percentage of unemployed, we will say 
to 50, Mien when title in# tho average for ton years, that 
figure alone gives a rate of 5 per cent, in August, whereas 
the excess hud nothing to do with the fact that August was 
the month concerned. If you fake a miflioioub number of 
years, however, those things will tend to equalizo one 
another, and if we nse the median instead of Mie arithmetic 
average extraordinary occurrences havo little effect, h\>r 
this reason if is best to estimate the period from the medians, 
In the end we shall imf get a smooth curve for our averages, 
mul may have to smooth that by a trigonometrical function, 
or by nemo other method. 

Hyjiictoaiatio Hhuihh. 

We will now dismiss Urn symptomatic curves; the top 
curve in Diagram X 1 1,1 (male death-rate) will do as well 
ns any as an illustration, for tho method of dealing with tins 
mu’ vo applies to a very great number of such curves. All 
statistics representing sociological phenomena that I have 
had experience of are symptomatic. Perhaps in very raro 
cases you will fi ml no symptom, but in general Micro is a 
symptom ; however remotely connected tho figures aro with 
the general progress of civili/ailum, you will find fhoro is 
Homo symptom up or down, or alternately up and down, In 



general we may assume a symptom in all figures relating to 
human society. In dealing with such curves, wo somotimos 
want to examine them in dotail for a short period; but vory 
often we are more concerned with tho symptom, especially in 
forecasting events. In curve A in Diagram XIII tlioro aro 
considerable and rapid fluctuations, but there is also distinct 
optical evidonco of a fall in tho rate beginning betweon 1805 
and 1870. The causes which produced the actual size of the 
ordinate are, of course, very many, and it is impossible to 
draw the line between those which tond to make a gradual 
permanent change, and those which tend to make a sudden 
temporary change. It is a question of degree and not of 
character, and for that reason alone it is impossible to give 
any theoretic solution for distinguishing tho symptom from 
tho small fluctuations, just as it is impossible to give any 
general solution to the interpolation problom. We liavo then 
to find an empirical solution, one that satisfies our immediate 
needs, It might appear best to draw a straight line, which 
on the whole shall differ from the observations as little as 
possible, and which could be determined by tho molhod of 
least squares ; this would assume a symptomatic tendency to 
equal increments or decrements in successive years. Or wo 
might assume a parabolic curve or logarithmic curve. A 
recent American writer has assumed that a certain serios can 
be represented by // = hu n } tho compound interest equation. 
But I think in general there is no reason to assume any 
dofinito algebraic law. Idle solution I should suggost — it is 
a commonplace one — is similar to that I have j list suggested for 
the removal of the period. It is most easily understood by 
an example. 

The figures in tho following table are from the Registrar- 
Gronerafls Returns, or are calculated from the Statistical 
Abstract. 
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Take the last group of figures, tho death rato of females. 
I take the average of the first five death rates, 20*1, in 1845 
to 24*4 in 1849, namely, 22*6, and place in the penultimate 
column at the middle of tho poriod, namely, tho yoar 1847. 
I begin again at the second year, 1846, and tako tho avorago 
for 1846-1850, namely, 22*6 again, and place that at 1848, tho 
middle year of that period ; and so on for 46 succossivo 
periods. Then on the Diagram XIII, I lxavo represented that 
line of moving averages by the dotted line running through 
the continuous line. I think it is clear that that lino ofTors one 
solution of the problem. In taking the average of any five 
years we are equally likely to include tho ups and downs of 
fclioir fluctations. If there was a regular poriod, if tho 
fluctuations were five-yearly wo should remove thorn entirely 
in five years, it would be the obvious time to take. If wo 
were dealing with figures referring to industry and the period 
was ten years, ten years would be the most appropriate longth 
of time to average, including as it would one contribution from 
each part of the fluctuation. If there is no regular poriod 
there is no rule to be given as to what numbor of years you 
shall take ; it is a matter of convenience. If the five-yearly 
average gives you a curve with sharp angles and apparently 
random fluctuations, increase the numbor of yoars. It is most 
convenient to work with an odd numbor of yoars, for tho 
middle of the period then coincides with the middlo of one of 
the 3 r ears ; but, on the other hand, a period of ten years gives 
arithmetical facilities. This method may, I think, bo loft 
for consideration j I believe it will be seen that it offers a 
solution of the problem. To complete it, I rocoimnond 
replacing the dotted line by a regular curve drawn very near 
to it, smoothing out any little fluctuations which aro loft, 
A curve thus drawn would fall from 1847 to 1858, and 
rise for about seven years and then fall, fairly rapidly, to 
about 1882, and more slowly afterwards. In the nature of 
things we cannot fix exact years for the end of tho rise or fall. 
It is absolutely necessary to have some such mothod of 
measuring the symptom before you can baso any argument as 
to tho change in the quantity measured. That is very 
important, For example, the curve D, which represents 
imports, is a sharply fluctuating curve with a partial period. 
If, to take a particular date, we had in 1879 looked at the 
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previous two years only we should have thought there was a 
rapid fall in the average imports ; but if wo looked at tho 
history of the phenomenon we should have scon that it 
appeared to he only part of a minor fluctuation, and in 1882 
we should have seen that average imports had boon stationary 
on the whole for eight years. 

It is not possiblo to say at the moment whether a fall is of 
a permanent nature or simply ono of those littlo fluctuations 
which characterize the phenomenon throughout tho half 
century* For instance, by 1908 wo can perhaps judge of tho 
tendency in 1900, but cannot judge of the current year 
because we have not enough information. 

The deviations obtained by subtracting the instantaneous 
average from the figures for each yoar arc givon in tho last 
column. The deviations for the first three groups of figures 
in the table are calculated on a similar method. Those 
deviations should havo some affinity to tho curvo of error. 
Great deviations should be rare compared with small deviations, 
and the occurrence of small and great deviations should havo 
some such relation as the occurrence of great and small 
deviations in the curve of error; but tho agroomont is not 
likely to be close, for the deviations calculated hero aro not 
independent one of the other; they are bound togothcr by tlio 
fact that the same number is used in forming' fivo succossivo 
averages, while the curve of error assumes that the things aro 
absolutely independent. 

Correlation between Series. 

That is a very rapid discussion of a rather wido subject, 
but I must lead on to the correlation botwoen two sots of 
figures. If we were dealing with a curve with no symptom 
and no period, for instance, two sets of figuros relating to tho 
weather, x l} . . . x n representing the averago temp ora turo, 
2/i> 2/s ■ • • Vn representing the average wind velocity, tho 
correlation between these two should be calculatod as already 
described, If we were dealing with a periodic curve wo 
should replace the periodic curvo by its lino of average 
before comparing it with another curve. If tliero is an 
irregular period, then I think we should proceed as if wo had 
a symptomatic curve with no period. Of course, any two 
periodic curves with the same period are correlated. Any two 
sequences of events which are influenced by the annual 



oliungns iii I lit' weather will give a strong degree of correlation 
<|iiifc independently <>f anything else. Thai, is a quantity 
whit'li in genornl will not lie worth measuring, but when wo 
coiiio to very irregular periods such as those wliicb wo find in 
tmilo sfnfisfics, it is worth niomtm'ing tlui corroliilion ovon 
throng'll tiio periods; heeimao it is not no obvious, for instance, 
Mini all flio lliieftiu.tioim of exports are correlated with all the 
llucfniifinns of imports, and that thn two together are 
eorrohiietl with the iimoimt of employment. 

A tlillieulfy urines in dealing with many oil won from the 
fuel, flint tin' suceesNivc deviatiotiM year by year aro not 
iiltog'olher indo|ii'iident. Many curves which deal with 
sociological |iheimiii('iia have lliic.fiiafinns each of which 
extends over several years, so that a rise in one year is moro 
often followed by a rise in Iho next than by a fall. Othor 
eur\ es have the o]i|iosile eliurnefer, [hat an excess in one year is 
followed by delects in the ether; for instance, if there is a 
grout death rate in one year we iniiy ex|ioot a comparatively 
small one in (lie following; and this absence of imlopondonco 
should lie kept in mind when we have to Imso arguments on 
| ho resulting correlation, lint apart from this, wo could trout 
the deviations from thn moving line of nveragos as deviations 
whoso eorrelalhin we ran fairly ralciilato. 

There is a. very great dilliealty in working out him correla- 
tion between ay iiiplematie. carves. If we do not take (ho 
deviation from flm lino of averages, lmt take the deviations 
from I lio average for the whole fi<) years, any two symptomatic 
curves will show correlation. If we fake two things which 

ore absolutely din inerted, except that they are belli 

phenomena, arising in the progress of society, and work out 
(In' eorllieieiif by the straightforward rule, wo shall find there 
is some eorrelnhiou. If two curves have short lliietmitions 
which are correlated, but opposite symptoms, Hum owing to 

|,|iii ay in pi apart from Ihe llitefuafioiis them would he 

negative eornilalion, while owing to the llueliuafions apart 

renin flm thorn would he positive correlation ; and 

when Imfh art' In Lt'ii info account the correlation may be 
positive, store, or negative. It is therefore necessary to treat 
Ihe symptom separately Iroiu the short lliietiiufions. On fho 
whole there in not much benclif in measuring fho correlation 
cnollif ienf for the symptoms; we should rather simply state that 
the symptom is say 15“ upward in one ease and 10° downward 



iii the other* The useful measurement ol the correlation 
between two such curves is not that of tho symptoms, bnfc of 
the deviations. 

Another question which arises very often in a practical 
way is, wliethor we should compare tho deviation for tho 
whole figures, say imports, with tho deviation foi 1 tho other, 
say the marriage rate, in the same year, or in tho next year 
Can we correlate the imports of 1847 with the marriage) raio 
of 1847, or should it be taken in comparison with that of 
1848 ? That question will often occur, especially botwoen 
marriage and birth rates. Mr. Hooker has suggested,* a 
suggestion which has been made independently in America, 
that we should work out the coefficients of correlation on 
the hypothesis of synchronism, and on al tomato hypotheses 
that one event follows half or one year aftor tho other, and 
see which correlation is the greatest. In this way wo should 
get a series of correlation coefficients according to tho 
dates we tako. 

Before we proceed to measure correlation by mathematical 
formula) we should observe it puroly graphically j and tho 
graphic representation of series will of ton suggest the 
existence of correlation, which can then be mousuvocl by tho 
mathematical formula. The curves A and B in Diagram XIII 
are obviously closely correlated. In tho curves B and C wo 
cannot decide from the figure whether there is correlation or 
not ; at any rate the evidence of correlation is not so great. 
In the curves B and D, I do not think we could docido from 
the figure as drawn, though we might perhaps from a fignro 
drawn in a different way, whether there was correlation or 
not. 

Let us proceed to discuss how to put two curves down so 
as to g*et optical evidence as to whether they are correlated 
or not. Instead of measuring figures as iu Diagram XU I 
measure as in Diagram XIV. Plot out tho deviations 
calculated on p, 79 above and below a base lino repre- 
senting' zero; but before doing so it is necessary to ehooso 
tho relative scales of the two quantities so as to havo a 
definite relation the one to tho other. There is no obvious 
way of comparing pounds sterling with one por thousand in 
the marriage rate. The way which naturally suggests itself 

Sf. rt e $ 0 !/ al Statical Society, Sopt. 1901. See oapmallv 
pp. 490-1. I flunk flint Mr. Hooker was also Lho first to publish ft calculation 
or correlation based on deviations from ft moving ftvoi’ftge. 
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iin cl it is very useful for making the optical evidonco of 
correlation vivid, is to represent the standard deviation for 
each group by unity on the vortical scale, Tlio standard 
deviation for tlio death rate of males is ‘830 ; of females it 
is -803 j so wo represent tlio deviation ’830 for males and 
•803 for females by the sumo vertical lino. If mi woro 
doing’ the same thing for imports and nude death rates, we 

a 2 



should represent by the .same vertical scale *386 of a pound 
sterling and *83 death rate. This method has beon applied in 
Diagram XTV, for the comparison of linos A and B in 
Diagram XIII. I have put in the death rate for males, 
represented by the zigzag line, but I found I could not enter 
the death rate for females in the same way and make the 
lines distinct, and therefore have drawn short horizontal 
lines to show the death rate for females year by year ; the 
clots and lines representing' the two series are in nearly ovory 
year close together. The optical evidence of correlation is 
very great indeed. 

The illustration taken is of two series where the cor- 
relation is nearly perfect; in less perfect casos wo can get 
ovidonco by noticing whether the maxima and minima occur 
at the same dates in the two series. Bor example, if wo took 
tho value of exports and percentage of unemployed wo 
should find perhaps that the maximum of the one came at the 
same time as the minimum of the other throughout, and that 
would give strong optical evidence of negative correlation. 
A method of testing whether there was correlation or nob, 
which would naturally suggost itself to anyone who lias a 
small knowledge of probability, would be to see how often a 
positive deviation of the one agreed with a positive deviation 
of tho other; how often like signs concurred and how often 
unlike signs concurred. If we wrote down 50 + and — signs 
at random and another 50 alongside, the chances of getting 
various numbers of agreements are easily calculated, and are 
in fact the successive forms in the expansion of (£+4)"; but 
that is not a good method, for it does not take into account 
one of the most important considerations, whether a groat 
fluctuation of the one corresponds with a great fluctuation of 
the other or not. Wo should got equal evklonco of correlation 
l>y this method when we had a resemblance of this sort in 
two curves where great fluctuations corresponded with great 
and small with small throughout, and when the correspondence 
was in sign only. Those things aro obviously not of the 
same importance, and so the method of meroly counting signs 
will not take us very fur. 

We will, then, proceed by the method of calculating 
correlation described above. Ileforring* now to tho table on 
p. 79, it should bo remarked that the method of evaluating 
tho value of imports changes at the year 1852; I had to 



approximate before that year from the values of the exports, 
because the figures given by the Board of Trade before and 
after that date arc calculated on different methods, and are 
nob comparable. Otherwise the figures of total value of 
imports for home consumption to the United Kingdom are 
comparable. To my mind the imports are more significant 
than the exports; and also it seems to me absurd to add 
imports and exports j I do not think you can add them 
together any more than you can add bread to butter. 1 have 
taken the imports only, and, without criticising the figures m 
detail, I have divided by the gross population as given in the 
►Statistical Abstract, Thus we get the amount per head 
given in the first column in the table. I have only intended 
to work to the second jduce of decimals. The death and 
marriage rates are taken from the Registrar- General's Report 
for 1895, which gives the figures for the previous 50 years. 
The standard deviations given at the bottom of the table are 
obtained by taking the square root of the sum of the squares 
of the 46 deviations given, divided by 40, as in the ordinary 
formula for standard deviation. The standard deviation is 
essentially an absolute quantity without sign. 

I have calculated the coefficients of correlation between 
groups l and 2 (imports and marriage rate), between groups 
l and 8 (imports and death rates), between 2 and 3 (marriage 
and death rates), and between 3 and 4 (death rates for males 
and females). I have intended to choose cases where, it priori 
wo might expect small correlation, no correlation, and great 
correlation. A priori we should expect correlation in the 
positivo sense between imports and the marriage rate; not 
that increased imports cause an increase of the marriage rate, 
but the causes which produce prosperity are likely to have 
effect in increasing both imports and the marriage rate, the 
coinplexus of causes which decido the two things have 
something in common. The coefficient is ’65. The marriage 
rate and death rate have presumably very little in common. 
One certainly could not say to start with whether an 
increasing death rate would synchronize with an increasing 
or with a diminishing marriage rate. The correlation between 
the two is — *19. The correlation between the imports and 
the deatli rale is — *22. The correlation between the death 
rate for males and that for females is -f'99 ; it is practically 1, 



but tho number 1 can only bo obtained if there is an 
absolute proportion all through the scale, which Lhoro is not in 
this case. 


Criterion or Significance or tub Correlation 
Coefficient. 


Now we are faoo to face with tho question, What do 
those numerical values moan, and which of them arc 
significant V It is clear that some such question arises^ 
becauso if wo write down two series absolutely at random 
and work out their formula) the chances are very much 
against your obtaining zero, and thcro are heavy odds against 
obtaining a small number. Now tho chtnico of obtaining* n. 
coefficient near zero increases with tho number of terms. LT 


wo liavo two series, u h u> 2 . . . u n and v u r 2 . . ,v n> measured 
from their averages, and we select a group of v’u which ai-o 
noar to ono another, tho u * s which will ho tlioir factors in 
forming the sum of tho products are equally likely to bo 
positive or negativo; if wo had an infinite number of those 
deviations tlioir sum will bo nothing ; and tho sum would 
tend to zero if wo increased the number of terms, tho actual 
deviation from zero being in inverse proportion to tho square 
root of n t tho number of torms. Honco the number of terms 
taken has much to do with tho significance of tho resulting 
coefficient of correlation, and we should expect that tho 


quantity 


*/n 


would enter into the measurement of tho 


significance of tho coeiliciont of correlation. It is a little 
difficult to state and oxplain the measurement of the criterion 
of the significance ; but it is absolutely necessary to make the 
attempt. Of tho coefficients just given, tho first mid fourth 
arc found to be significant, and the second and third not, 
when tested by the theoretical criterion. 

Suppose we take two correlated groups, and that there is, 
as a matter of fact, a definito value for the coefficient of 
correlation ; and then suppose wo take 50 samples from ouch, 
that is to say, 50 pairs of events, we shall not naturally obtain 
exactly the coefficient of correlation that belongs to 
the whole groups. The chances are against obtaining 
exactly that result. Now, tho deviations from tho actual 
coefficient of correlation which are obtainod by taking samples 



and finding tlxo correlation have a curve of! frequency 
1 

?/=- ,- e , where y is tlie probability that tlio coefficient 

C V7 T 

obtained differs by from the true coefficient, and c, tlio 


modulus = (1— r 2 ) * <\J^ r where r is the result obtained from 

the sample group, which consists of n pairs. The probable 

1 — f 1 

error in this curve of oiror is *G7 of —r- .* For example, in 

vn 

tlie coefficient between imports and the marriage rate, 
tlio calculated coefficient of correlation is *G5, and the probable 

1 _ (. 05)8 

error for its curvo of frequency is '67 x — -- - — '05G. 
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That is to say, from tlio calculation itself it is as likely as 
not that tlie actual coefficient is between 'Go-P'OoG and 
*65 — ’056, Tlie chance of tlio true coefficient being" as much 
as tlio modulus, namely '115, distant from the calculated *G3 
is shown by the table of the error function to bo only 1G in 
100 * the chance of it being so far from *65 as to be actually 
zero are infinitesimal, for in the curve of error the cases 
where the deviation us as much as six times the modulus are 
practically non-existent. So that we have overwhelming 
evidence, if our general principle of calculation is correct, of 
correlation between the first and tho second columns, and the 
most probable value of that correlation is two-thirds. In 
other words, tho standard deviation of imports being £*387, 
and the standard floviatiou of tlie marriage rate *37, 
tlio most probable deviation of the marriage rate is 
+ of *37 — *24, when wo find a deviation in imports of 
+ £*387, and so oil in proportion. This statement should be 
coimociod with tlio graphic measurement of correlation 
discussed on p. 70. In tlio second case of correlation, that 
between imports and tlio male doath rate, whore tlio 
coefficient is — *22 the probable error by the method just 
described is *09. That is to say, our calculation means that it 
is as likoly as not, from our evidence, that tho coit elation 
hofcwoon tlioso two series is betweon ~*13 and — *3 J j tlio 
chance that tho real correlation is zero or positive is quite 
porcopliblo. Tho ehanco from tho table of tho error 


* See Pearson, in Royal Soo. Trans.) A. 175, p. 235 ; tunl cori'ccUon* in 
Royal Soo . Proceedings, Oct. 18lh, 1807 j also Ynlu, in Statistical Journal 
1897, p. 8‘ir7, 



function that a negative deviation as great as '22 should 
occur when the probable error is ‘09 is about one in ten. If 
wo took ten groups which had zero, or slight posilivo 
correlation, in one of those groups you might expect to get 
such a result as —'22. Similarly, the chance that uncorrolatod 
groups of 46 pairs should give the coefficient found 
between marriage and male death rate, namely, —’19, is one 
in six ; that is to say, onco in six groups which were not 
connected you would obtain that apparent correlation, Tlio 
chances that you obtain tlio coefficient of correlation '99 from 
a random group is practically zero. That is to say, tliore is 
correlation between male and female death rates, and it is of 
such a nature that you could, given the deviation of death 
rate of males in the year, write down with very fair certainty 
the average death rate of females. For example, given that tlio 
death rate of malos was +*5 in excess of tho moving average, 
that then tho most probable death rate of females would bo 
'80 

-gg x ’5, or *48 in oxcoss of the average, and it is unlilcoly 

that any rafco differing at all far from this will occur. 

We have thus found a way of measuring correlation, and 
of testing tho significance of our measurement, botwoon two 
groups and between two series. The method must bo used 
with discretion. There is no time to discuss under wluit 
circumstances it is applicable, nor the further developments 
of the theory. 

Conclusion. 

In theso lectures I liavo tried to indicato tho common-sense 
treatment of curve drawing and averages on the one hand, and 
the more delicato and exact motliod of representing groups 
and series by quantities based upon algobraic work on tlio 
other. Directly we attempt to use tho latter mothods, the 
algobraic mothods, wo find that wo are bound to nmko 
approximations that involvo tho use of the theory of 
probability and tho theory of error, and [ havo thorofore 
been compelled to deal witli tlieso thoories. When I havo boon 
treating thorn I have not attomptod to promulgate any 
original opinions, I have only tried to illustrate principles, 
which are already laid down, by new examples. But sinco 
the modern shape of the theories of probability and error is 



now, and involves some matters which are still controversial — 
so far as mathematical reasoning cau he controversial — I 
have found it necessary to spend some little time in examining 
the foundations of the theories in some detail. I have only 
been able to deal with the beginnings of some of the difficult 
questions which arise, and I am sorry that for want of time I 
havo been compelled to leavo out many illustrations of the 
practical utility of the methods; I have had to spend time on 
the theory rather than on the practice. My object will have 
been completely attained if I have succeeded in indicating 
the scope and the interest of the application of the theory of 
error, a subject which urgently needs the co-operation of 
serious students, alike to calculate experimental data, which 
are very much wanting, and to criticize, establish, and enlarge 
the body of theory. 
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